If you’re interested in translating or adapting this post, please email us first .

Learn how to bring your Ruby test suite back to full health, and full speed, with TestProf—a bag of powerful tools to diagnose all test-related problems. This time, we talk about factories: how they can slow down your tests, how to measure that negative impact, how to avoid it, and how to make your factories as fast as fixtures.

Also on our blog:

TestProf introduction

TestProf, used on many Evil Martians’ projects to shorten a TDD feedback loop, is a must-have tool for any Rails (or another Ruby-based) application whose tests take more than a minute to run. It works with both RSpec and minitest by extending their functionality.

In our introductory article, where we presented this open source project, we promised to dedicate a whole article to an often overlooked problem with testing Ruby web applications: factory cascades. We are finally keeping good on our promise.

It is better to explore TestProf by running it on your actual tests, so if you happen to have an RSpec-covered Rails project with factory_bot (previously known as factory_girl ) factories close at hand—we recommend to install the gem before reading on, it is to going to be an interactive walk-through!

Installing TestProf is as easy as adding a single line of code to your Gemfile ’s :test group:

group :test do gem 'test-prof' end

Crumbling factories

Whenever we test our applications, we need to generate test data—two common ways to do it are factories and fixtures.

A factory is an object that generates other objects (that may or may not be persisted) according to a predefined schema and does it dynamically. Fixtures represent a different approach: they declare a static state of the data that is loaded into a test database right away and usually persists between test runs.

Fixtures are fast by design. Factories are more popular. We believe factories can compete in performance with fixtures, and even be used as fixtures. Read on to see how.

In Rails world, we have both built-in fixtures and popular third-party factory tools (such as factory_bot, Fabrication, and others).

While “factories vs. fixtures” debate never seems to cease, we consider factories to be a more flexible and a more maintainable way to deal with test data.

However, with great power comes great responsibility: factories make it easier to shoot yourself in the foot and bring your test suite to a crawl.

So how can we tell if we misuse that power or not, and what can we do about it? First, let’s see how much time our test suite spends working in the factories.

In our previous article, we used EventProf to time database interactions.

For that, we should call our doctor: TestProf. This gem is a bag full of diagnostic tools, EventProf being one of them. The name gives it away: it is an event profiler that can be told to track a factory.create event, which is fired every time you call FactoryBot.create() , and a factory-generated object is saved to a database.

EventProf works with both RSpec and minitest and has a command-line interface, so fire up your terminal in any Rails project folder (it has to have tests and factories, of course, and all examples in this article assume RSpec) and run this line:

$ EVENT_PROF="factory.create" bundle exec rspec

In the output, you see the total time spent on creating records from factories and top five slowest specs:

[TEST PROF INFO] EventProf results for factory.create Total time: 03:07.353 Total events: 7459 Top 5 slowest suites (by time): UsersController (users_controller_spec.rb:3) – 00:10.119 (581 / 248) DocumentsController (documents_controller_spec.rb:3) – 00:07.494 (71 / 24) RolesController (roles_controller_spec.rb:3) – 00:04.972 (181 / 76) Finished in 6 minutes 36 seconds (files took 32.79 seconds to load) 3158 examples, 0 failures, 7 pending

In a real-world example taken from one of our projects (before refactoring), out of six and a half minutes spent on test run more than three minutes were spent on generating test data, which accounts almost for 50%. That should not surprise you: in some codebases I worked on, generating records from factories took as much as 80% of the test time.

Keep calm and continue reading, we know how to fix this.

The name of the game is “cascade”

From years of observations and working on TestProf, profiling pretty much anything test-related, one reason for slow tests stands out the most—factory cascade.

Let’s play a little game:

factory :comment do sequence ( :body ) { | n | "Awesome comment # #{ n } " } author answer end factory :answer do sequence ( :body ) { | n | "Awesome answer # #{ n } " } author question end factory :question do sequence ( :title ) { | n | "Awesome question # #{ n } " } author account # suppose it's our tenant in SaaS application end factory :author do sequence ( :name ) { | n | "Awesome author # #{ n } " } account end factory :account do sequence ( :name ) { | n | "Awesome account # #{ n } " } end

Now, try to guess how many records are created in the database once you call create(:comment) ? If you have taken your pick, read on.

First, we generate a body for the comment . No records created yet, so our score is zero .

for the . No records created yet, so our . Next, we need an author for the comment . The author should belong to an account ; thus we create two records. Score: 2 .

for the . The should belong to an ; thus we create two records. . Every comment needs a commentable object, right? In our case, it’s an answer . An answer itself needs an author with an account . That is three more records. Score: 2 + 2 = 4 .

. An itself needs an with an . That is three more records. . The answer also needs a question , which has its own author with its own account . Furthermore, our :question factory also contains an account association. Score: 4 + 4 = 8 .

also needs a , which has its own with its own . Furthermore, our factory also contains an association. . Now we can create the answer and, finally, the comment itself. Score: 8 + 2 = 10.

That is it! Creating a comment with create(:comment) yields ten database records.

There could be much more records. Think associated models generated through Active Record callbacks (never do that).

Do we need multiple accounts and different authors to test a single comment? Unlikely.

You can imagine what happens when we create multiple comments, say, create_list(:comment, 10) . Houston, we’ve had a problem.

Meet factory cascade—an uncontrollable process of generating excess data through nested factory invocations.

We can represent a cascade as a tree:

comment | |-- author | | | |-- account | |-- answer | |-- author | | | |-- account | |-- question | | | |-- author | | | | | |-- account | | | |--account

Let’s call this representation a factory tree. We are going to use it later in our analysis.

Fire walk with me

EventProf only shows us the total time spent in factories, so we can tell that something goes wrong. However, we still have no idea where to look, unless we dig through the code and do the guessing game. With another tool out of TestProf’s doctor bag, we don’t have to.

Meet yet another profiler: FactoryProf. You can run it like this:

$ FPROF=1 bundle exec rspec

The resulting report lists all factories and their usage statistics:

[TEST PROF INFO] Factories usage total top-level name 1298 2 account 1275 69 city 524 516 room 551 549 user 396 117 membership 524 examples, 0 failures

How are our total and top-level results different? The total value is the number of times a factory has been used to generate a record either explicitly (through create call), or implicitly, within another factory (through associations and callbacks); the top-level value only considers explicit calls.

Thus, a noticeable difference between top-level and total values might indicate a factory cascade: it tells us that a factory is more often invoked from other factories then by itself.

How do we pinpoint those “other factories”? With the help of factory trees discussed earlier! Let’s flatten our tree (using pre-order traversal) and call the resulting list a factory stack:

// factory stack built from the factory tree above [: comment , : author , : account , : answer , : author , : account , : question , : author , : account , : account ]

Here is how a factory stack can be built programmatically:

Every time FactoryBot.create(:thing) is called, a new stack is initialized (with :smth as the first element).

is called, a new stack is initialized (with as the first element). Every time another factory is used within a :thing , we push it to the stack.

Why are stacks cool? Exactly as with call stacks, we can draw flame graphs! And what is cooler than a flame graph?

FactoryProf knows how to generate interactive HTML flame graph reports out of the box. Here is another command line invocation:

$ FPROF=flamegraph bundle exec rspec

The output contains a path to an HTML report:

[TEST PROF INFO] FactoryFlame report generated: tmp/test_prof/factory-flame.html

Open it in your browser to see something like this:

An interactive FactoryFlame report

How do we read this?

Every column represents a factory stack. The wider the column, the more times this stack had occurred in a test suite. The root cell shows the total number of top-level create calls.

If your FactoryFlame report looks like a New York Сity’s skyline, then you have a lot of factory cascades (each “skyscraper” represents a cascade):

FactoryFlame, Gotham version

Though a joy to behold, this is not how your ideal cascade-less report should look like. Instead, you should aim for something flat like a Dutch countryside:

Where are the windmills?

Doctor, am I going to live?

Knowing how to find cascades is not enough–we need to eliminate them. Let’s consider several techniques for that.

Explicit associations

The first thing that comes to mind is to remove all (or almost all) associations from our factories:

factory :comment do sequence ( :body ) { | n | "Awesome comment # #{ n } " } # do not declare associations # author # answer end

With this approach, you have to explicitly specify all required associations when using a factory:

create ( :comment , author: user , answer: answer ) # But! create ( :comment ) # => raises ActiveRecord::InvalidRecord

By the way, removing optional associations from factories is always a good idea.

One may ask: aren’t we using factories precisely to avoid specifying all the required arguments every time? Yes, we are. With this approach, factories become faster, but also less useful.

Association inference

Sometimes (usually when dealing with denormalization) it’s possible to infer associations from other ones:

factory :question do sequence ( :title ) { | n | "Awesome question # #{ n } " } author account do # infer account from author author & . account end end

Now we can write create(:question) or create(:question, author: user) and do not create a separate account.

We can also use lifecycle callbacks:

factory :question do sequence ( :title ) { | n | "Awesome question # #{ n } " } transient do author :undef account :undef end after ( :build ) do | question , _evaluator | # if only author is specified, set account to author's account question . account ||= author . account unless author == :undef # if only account is specified, set author to account's owner question . author ||= account . owner unless account == :undef end end

This approach can be very efficient but requires a lot of refactoring (and, frankly speaking, makes factories less readable).

Factory default

TestProf provides yet another way to eliminate cascades–FactoryDefault. It is an extension for factory_bot that enables more succinct and less error-prone DSL for creating defaults with associations by allowing you to re-use records inside the factory implicitly. Consider this example:

describe 'PATCH #update' do let! ( :account ) { create_default ( :account ) } let! ( :author ) { create_deafult ( :author ) } # implicitly uses account defined above let! ( :question ) { create_default ( :question ) } # implicitly uses account and author defined above let ( :answer ) { create ( :answer ) } # implicitly uses question and author defined above let ( :another_question ) { create ( :question ) } # uses the same account and author let ( :another_answer ) { create ( :answer ) } # uses the same question and author # ... end

The main advantage of this approach is that you don’t have to modify your factories. All you need is to replace some create(…) calls with create_default(…) in your tests.

On the other hand, this feature introduces a bit of magic to your tests, so use it with caution, as tests should stay as human-readable as possible. It is a good idea to use defaults only for top-level entities (such as tenants in multi-tenancy apps).

Bonus: AnyFixture

So far we have talked only about factory cascades. What else could we learn from TestProf reports?

Let’s take a look at the FactoryProf report again:

[TEST PROF INFO] Factories usage total top-level name 1298 2 account 1275 69 city 524 516 room 551 549 user 524 examples, 0 failures

Take note that room and user factories are used about the same number of times as the total number of examples. Thus, it is likely that we need both in every example. What about creating those records once, and for all examples? For that, we can use fixtures!

Since we already have factories, it would be great to re-use them to generate fixtures. Here comes AnyFixture.

You can use any block of code for data generation, and AnyFixture takes care of cleaning out the database at the end of the run.

AnyFixture works perfectly with RSpec’s shared contexts:

# Activate AnyFixture DSL (fixture) through refinements using TestProf :: AnyFixture :: DSL shared_context "shared user" , user: true do # You should call AnyFixture outside of transaction to re-use the same # data between examples before ( :all ) do fixture ( :user ) { create ( :user , room: room ) } end let ( :user ) { fixture ( :user ) } end

And then activate this shared context:

describe CitiesController , :user do before { sign_in user } # ... end

Check out this real-world example for AnyFixture being used to set up a context for acceptance tests.

With AnyFixture enabled, the FactoryProf report could look like this:

total top-level name 1298 2 account 1275 69 city 8 1 room 2 1 user 524 examples, 0 failures

Looks good, doesn’t it?

Do not choose between factories and fixtures–use both!

Thank you for reading!

Factories bring simplicity and flexibility to your test data generation, but they are very fragile—cascades can come out of nowhere, repetitive creation can consume too much time.

Take care of your factories and take them to a doctor (TestProf) regularly. Make tests faster and developers happier!

Read TestProf introduction to learn more about the motivation behind the project and other use cases.