Writing automated tests is a “must-have” practice in everyday testing work if you want to have a prompt response on any changes that are made on your software.

The Continuous integration process is crucial in everyday software development practice. In order to have quick feedback on your application quality, you need to have a rich test suite that will cover “application ground” as much as possible. But what if your automated test suite is so big that it requires a couple of hours of execution before you get feedback on your application quality? Even worse, what if it is a release day?

If that is the case, parallel test execution is your answer. This article gives an overview of things to consider when doing the parallelization of your tests in any testing framework but also gives a practical solution on how to do it using the RSpec test framework.

Things to consider

Having parallel test execution is not something that can be magically turned on and off with a switch. You need to consider how to approach this challenge so you have a real benefit from it and not become “slave of test scripts refactoring”. These are the things to consider:

Do you need parallelization at all?

First, analyze the duration of your test suite execution. Is it too slow for your team needs? Do you have duplicated checks that are testing the same things in different tests and that is increasing the overall duration of your test suite? Refactoring is often an answer before taking further steps with tests parallelization

Do you have the infrastructure for parallelization?

If your CI environment does not have a multi-core processor, you will not see the real benefit of parallelization. Tests will still be executed in one process. If this is the case, can you utilize cloud resources for your tests? If the answer is yes, you can try solutions like running tests inside AWS EC2 instance where you could choose which instance size provides enough processing power to see the real benefit of parallelization. Of course, when considering cloud solutions, that comes with a price of resource usage that must be calculated

Do you have tests capable of being executed in parallel?

This is often a problem that emerges the first time we actually try to execute tests in parallel. Some tests from your suite share the same resources which makes them unreliable for running in parallel. Whenever you consider parallelization in computer science, there is a potential for the “race condition” problem. In programming, this is handled with a variety of techniques like locks, semaphores, monitors..etc. When creating tests, the analogy is the same. This time, you need to take care of the resources that your tests are using. For example, if two tests read the same file, do something, and then check that something is written in those files, that is a potential problem when running these tests in parallel. Problem that can emerge from this situation is that suddenly some of your tests are failing and you are getting false positives (tests are failing even though application/feature works). As any multithreaded problem, it is hard to reproduce and figure out what creates a problem. Therefore, first, make sure that all tests are isolated (have setup and teardown process and use unique resources). This will probably require some test refactoring or even redesigning test scripts but will eventually save you from a lot of headaches that you will certainly have when troubleshooting why some of the tests are failing intermittently. There are some approaches (which I will present later in the article) which give the ability to execute tests that use the same resources after all but I would recommend using this only as the last solution if refactoring of your tests is too expensive for the time being. Advice: Tests need to be isolated!

How to parallelize tests with RSpec?

Now you have a good idea about the problems that can emerge when parallelizing your test suite, next thing to look is how to do it using the RSpec test framework (This article focuses on RSpec but other popular frameworks also provide parallelization solutions). The first thing to look at is parallel_tests gem (https://github.com/grosser/parallel_tests). Readme provides a lot of information on how to set up your tests to be executed in parallel (primarily unit tests in Rails), but in this article, I will focus on how to execute parallel functional RSpec tests which do not necessarily need to be connected with Rails framework.

The first thing to do is install parallel_tests gem (put it inside your project Gemfile and run bundle install). To run your test suite in parallel, just simply run this command:

parallel_rspec spec

This will run all tests inside your spec folder in parallel and will use all available processor cores on your CI environment. Of course, parallel_rspec command offers a couple of useful options:

-n – number of parallel processes that will be used

-p – pattern of tests that will be run in parallel (This enables running only specific tests in parallel while executing other tests sequentially like before)

-s – this enables running certain tests in a single process (Good candidates are all tests that use same resources. Of course, with this approach we will not gain parallelization benefit from these tests but it will save us the time needed for refactoring)

–serialize-output – tests output will be serialized and will be written after execution of tests once process is completed. This will give us a nice view of test execution logs in contrast to the approach when we don’t use this option and output is written in parallel too. This makes troubleshooting of test execution much harder

This is a more complex command that uses some of the options that parallel_rspec offers:

parallel_rspec -n 4 –single _np –serialize-stdout spec

This means that tests will be executed in 4 processes in parallel, where one process will run only tests that match the pattern – _np and output will be serialized and written in a friendly way.

Good vs. bad design of RSpec test from a parallelization perspective

These are two examples of test specs that have bad and good design from a parallelization perspective. Both examples have two specs (first_spec and seconds_spec)

Bad Example:

def create_first_file File.open "output.txt", "w" do |file| file.write("Output from FIRST spec") end end describe "First test" do context "This test: " do it "passes" do create_first_file output = [] puts "File content is:" File.open("output.txt", "r") do |f| f.each_line do |line| puts line output << line end end expect(output).to match_array(["Output from FIRST spec"]) end end end

def create_second_file File.open "output.txt", "w" do |file| file.write("Output from SECOND spec") end end describe "Second test" do context "This test: " do it "passes" do create_second_file # Simulate some work here... sleep 1 output = [] puts "File content is:" File.open("output.txt", "r") do |f| f.each_line do |line| puts line output << line end end expect(output).to match_array(["Output from SECOND spec"]) end end end

Both tests are doing a similar thing (writing data with different content into the same file and then trying to read those data). Since both tests are using the same resource (same file), when running RSpec tests regularly (in sequence), no issues will happen. However, if we run these tests in parallel potential problems can happen because both tests are reading data from the same file expecting different outputs. Therefore, problem will happen intermittently and tests will fail but not always and not all of them which makes this kind of problem very hard to debug and reproduce (this example is simplified version and problem is obvious, however in real tests which are more complex, this problem is very hard to debug). Output of execution when one of the tests fail looks like this:

Good Example: On the other hand, we have a good example which contains the same tests but with two key differences:

1. Both tests have setup and teardown part (before and after blocks)

2. In these blocks, they are creating and deleting a file with a different name

describe "First test" do before(:all) do File.open "first.txt", "w" do |file| file.write("Output from FIRST spec") end end context "This test: " do it "also passes" do output = [] puts "File content is:" File.open("first.txt", "r") do |f| f.each_line do |line| puts line output << line end end expect(output).to match_array(["Output from FIRST spec"]) end end after(:all) do FileUtils.rm("first.txt") end end

describe "Second test" do before(:all) do File.open "second.txt", "w" do |file| file.write("Output from SECOND spec") end end context "This test: " do it "also passes" do # Simulate some work here... sleep 1 output = [] puts "File content is:" File.open("second.txt", "r") do |f| f.each_line do |line| puts line output << line end end expect(output).to match_array(["Output from SECOND spec"]) end end after(:all) do FileUtils.rm("second.txt") end end

This will make sure that the parallel execution of tests does not make any difference in terms of test results. Therefore, the key thing is to have unique resources that your tests are using. In this example, we used a very obvious shared resource (file). However, when testing real application, that can be any part of the application (to be more precise, any type of create/update/delete action on your application, like create user..etc.)

How did we benefit from the parallelization of tests?

Everything that was said up until now is only theory. In a real-world scenario, we used parallelization to resolve problems with slow execution of regression tests. It is usually practiced to run regression tests against our application a couple of times in a week. However, as part of the regular release process, we need to run regression tests. Because of that, long-lasting execution significantly delays feedback that we have on our application quality on release day. If some issue is discovered and requires a hot fix, we need to run regression tests once again. Therefore, the release process lasts a couple of hours more than it should which can lead to breaking application release deadlines.

The effort to enable parallelization was a bit complicated to handle because a large part of tests were tightly coupled and they were not designed to be executed in parallel (from the start). That required some refactoring and redesigning of tests and some of them were too expensive to be redesigned for this purpose. Therefore, we focused mostly on tests that are long-lasting and tried to parallelize them.

This is the stats we had before parallelization (CI server):

EC2 instance m1.small: 1xCPU, 1.7GB RAM, 160GB storage (old generation instance)

Number of tests: 120

Execution time: ~ 3 hr 45 min

This is the stats we have after parallelization (CI server):

EC2 instance c3.xlarge (on-demand instance): 4xCPU, 7.5GB RAM, 2x40GB SSD storage

Number of tests run in parallel: 77

Number of tests run in one thread sequentially: 43

Execution time: ~ 1.5 hr

We used expensive AWS instance only because we use it in an “on-demand” way. This means that instance is running only when tests are executed. Therefore, running this instance couple of hours (approx. 10 hours) per month didn’t cost much. That enabled us to have a much powerful multi-core instance that can be fully utilized for parallel test execution. How to run AWS instances in an “on-demand” way is explained in another article on our blog.

Lessons learned?

Design tests in a way that they can be parallelized later on if needed (the sooner the better)

Think about hardware costs if running tests in the cloud (consider using instances in on-demand fashion)

Identify tests that are long-lasting and parallelize them first, then move on to ones that don’t take too much time to be executed

Consider if you need parallelization at all (don’t do it just to be fancy).