The Symptom

So, your tests are taking a long time to run. You go in and try to figure out why and then you find many instances of timer:sleep/1 applications spread around in different test cases / suites (True Story: I’ve seen timer:sleep(10000) more than once! — I even wrote some of them 🙈). Of course, that’s why the tests are slow, they’re just stalling there for seconds at a time… just sleeping.

Why do we do that?

There might be multiple reasons leading us to introduce timer:sleep/1 calls in our test cases, but in the general structure of those tests is something like this:

test(_Config) ->

do:something(in_background),

timer:sleep(6000),

verify_the_expected_result().

We evaluate some function that will return immediately while also triggering some background task. Then we wait enough time for that background task to complete and check that it, in fact, was completed correctly.

We can’t just not wait (i.e. remove the call to timer:sleep/1 entirely) because given the concurrent nature of Erlang, it’s more than likely than the expected side effect will not have happened by the time the test evaluates the next line if we do so.

But timer:sleep/1 is not the best way to write this kind of tests…

What’s wrong with timer:sleep/1 ?

To use timer:sleep/1 you need to choose a number (i.e. how many milliseconds do you wish this process to sleep for). Choosing the right number for that parameter is generally hard, if not plainly impossible.

Let’s assume that you know that your background task will never last more than 5 seconds (there is a hard timeout somewhere within it).

Now you need to decide for how long your test should sleep. If you choose a number that’s below 5000, your test may report an error while your system is actually working as expected. Even if you choose to wait exactly 5000ms, you can’t be sure that if the test fails it’s because the system is not working as expected. It might be some schedule-related delay, maybe message got queued for a bit, etc.

So, you choose a number that’s larger than 5000. But… how larger? 5100? 5200?… Let’s choose 6000 just to be sure. And now each test run is 6 seconds longer just because of that. When, in reality, 5000 was just the upper bound. Most likely, your background task (particularly in test mode) takes only 100ms or so.

In other words, using timer:sleep/1 forces you to make a trade off between unpredictable test results and wasting lots of time.

But there is a better way…

What we should do instead?

The basic idea for the solution proposed by Fred above and also implemented in ktn_task:wait_for/2,4 is to periodically check to see if we get the expected result and only fail if we don’t get it after a long time.

The implementation of ktn_task in github is a bit more complex since its more generic, but in a simpler way…

wait_for(_Task, ExpectedResult, _SleepTime, 0) ->

exit(timeout);

wait_for(Task, ExpectedResult, SleepTime, Retries) ->

timer:sleep(SleepTime),

case Task() of

ExpectedResult -> ok;

_SomethingElse ->

wait_for(Task, ExpectedResult, SleepTime, Retries - 1)

end.

You would use it like this:

test(_Config) ->

do:something(in_background),

ktn_task:wait_for(fun verify_the_expected_result/0, ok, 100, 60).

The key is in the last 2 parameters. Instead of simply waiting 6000ms and then checking once, you wait at most 60 rounds of 100ms and check each time. If, as expected, the correct result is obtained before the 6th second, you just move on with your test… like a boss 😎

Lesson Learned. What now?

My plan is to add this as a guideline, and then get Elvis to validate it for us. Those PRs are in public repos, so if you feel like +1'ing the guideline or submitting a PR to the elvis repo for this, I’ll highly appreciate it 🙏.