Tests slow? Flaky? Hard to understand?

I’m Tom, a QA Engineer in the ASOS Tech team, and if you’re familiar with any of the issues above, then I encourage you to read on.

I joined ASOS Tech last year, entering the Saved Items API team. If you shop with ASOS, we’re the team that deals with managing, storing and providing the data for the products that people ‘save’ to come back to later.

Saved Items was already established when I joined and had a solution in place for its development and testing. This consisted of a regression pack of acceptance, integration and unit tests. These were all run by developers after a code change, and automatically run in our pipeline on commits, with a policy in our repository whereby code changes could not be merged unless all tests passed. So, we had a solution to ensure quality and make sure that bad code does not slip through the cracks.

Coming from my previous job, where developers could (and did) merge straight to master (consequently breaking stuff), I was happy with what we were doing here.

However, to my surprise, our developers weren’t happy with our tests, and raised the following concerns:

Tests were slow to run — they were unable to run in parallel

Tests couldn’t always be trusted — sometimes they would pass and sometimes they would fail with no code changes. Tests were not independent. One test could affect the results of another test

It was hard to figure out exactly what a test was doing

We had a testing pack that was slow, cumbersome and couldn’t be trusted — effectively ruining the idea that it provided quality to our API.This wouldn’t do.

We set out to make changes to ensure people could trust our tests and would want to run them, to give confidence in our code.

Firstly, we needed to find out the root cause of these problems. I did some digging and debugging, and these were my findings:

They were slow— tests would run sequentially, one after another

Untrustworthy — tests were not fully independent. One test could affect the results of the next

Hard to decipher — method names were unclear, and didn’t always translate into acceptance criteria

We would each run these tests ourselves before changes and commits, re-running two or three times on average because they were unreliable. These tests would run sequentially, so weren’t quick to re-run either.

Not only that, we incorporated these automation tests into our pipeline. We would run these tests against different test environments, and only once they all passed would we allow a merge to master. Again, this is a nuisance re-running, but consider this:

You’ve found a bug in production. You need to get a fix out urgently. You create the fix, you run your tests, they pass and you push your change out. Your pipeline picks it up and starts running your tests again. FAIL.

We’re trying to get an urgent fix out and we’re being slowed down by tests failing. Production quality is currently reduced. But tests are meant to increase quality?

And so I created BDTest — a testing framework for .NET Core and a set of good practices that help ensure test stability, test speed and ease of understanding. This framework follows an approach called Behaviour Driven Testing, which we’ll get familiar with first.

Behaviour Driven Testing (BDT)

BDT focuses on user behaviour rather than intricate system technicalities.

I’ll walk you through what I consider are the benefits of BDT and how you can implement BDTest to highlight and make use of these benefits.

What are the benefits?

Requirements

If we’re working within an agile environment, and working with user stories, then we should ideally have acceptance criteria. This criteria usually implements a GWT (Given, When, Then) approach.

Given *some setup actions*,

When *I perform the action-under-test*,

Then *this outcome is as expected*

This GWT approach can be transformed into BDT. We can write these statements as steps that perform the actions described. We’re focusing on the actions of the users, which match the requirements set out in our acceptance criteria.

Clarity

By focusing on actions, rather than technicalities, we can show and share our tests with business analysts, product owners and stakeholders to showcase what testing we’ve performed. This will be clear and understandable as there’ll be no jargon, no technical component know-how prerequisite — just actions that a user would undertake. This allows review, feedback and clarity from the rest of your team regarding the testing that has taken place.

Re-usability

By writing our tests in steps, we’re using a modular approach, creating building blocks. We can promote re-usability of our tests, making writing tests faster and fixing them easier. E.g. ‘Given I login to the system as an admin’.

This can be re-used within multiple tests. Also, if this process changes, just altering this one step will fix all tests that use it.

Maintainability

This is similar to above. Technical processes may change, but user actions generally remain the same. If the logic process is completely rewritten on the backend, the user needs not know about this. Therefore, if our step, mimicking the user, is ‘Given I login’, then this step remains the same within our tests.

BDTest

BDTest is a framework for .NET Core 2.0 and above, focusing on:

Data

Data is a powerful asset. I always think of the phrase ‘knowledge is power.’

BDTest will collect data while tests are running, and use this to produce reports. You can also access the raw data yourself to create custom reports, or perform data science wizardry.

Test stability and speed

Tests can be flaky. I’ve seen tests influencing other tests (no-no!), and the lack of concurrency because of shared members. BDTest best practises encourage independence among tests, which results in better stability and speed through parallelism.

BDTest is available as NuGet packages.

For the framework itself > Install-Package BDTest

For the reports > Install-Package BDTest.ReportGenerator

For an NUnit Context Injection Helper > Install-Package BDTest.NUnit

The tech part

Let’s start with a custom class to hold our test data — we’ll call it TestContext. We want this to hold some context about the test and what’s happened, so that when we pass it into different methods, such as validators, it has data to look at.

Here’s our simple TestContext, holding our HTTP request and response.

(For using test context injection with BDTest, there’s a restraint that the object must have a no-args (0 parameters) constructor. Otherwise you should implement your own injection logic.) Let’s start writing a test…

BDTest recommends one test class per feature/fixture. A feature should relate to a single specific area of the system we’re testing.

This will help with maintainability, searching and reporting of tests.

We’ll test logins in this example.

If you’re using NUnit, I recommend extending from NUnitBDTestBase, and passing in your TestContext type. This would look like:

If you’re not using NUnit, extend from BDTestBase:

Previously our test classes were similar to this:

What’s the issue?

The TestContext field!

Why?

It’s available to any member of the class. So different tests inside this class can all access it. After some investigation, I found that we weren’t clearing out this context after each test. Some tests were falsely passing, because they were validating against data that had been left there from the test that ran before it.

As well as this, we couldn’t turn parallelism to full because all of the tests read and write to this same object. This meant that if parallel test execution had been implemented, we may have set our HttpResponse object in our context, and then immediately after, another test overwrites it with another HttpResponse , and by the time we’re performing our assertions, we’ve got the wrong HttpResponse and our tests are failing!

There’s no test independence here. And tests should be independent.

There was some refactoring to be done — partly why I created BDTest.

I prefer the NUnit approach here, but I’ll show you my two ways around this:

If you’re extending from BDTestBase (without NUnit)

Here we have a WithContext helper. It will construct you a test context that is only available to its scope. Within that scope, you write your test, and it can’t be influenced by anything outside of this scope.

Or if you’re extending from NUnitBDTestBase<>

This is my preferred way. Here you can see we’re calling Context . We haven’t declared that anywhere though.

The base class is doing all the magic for us here. For every NUnit test, whenever you call Context , it will check if you have a context previously created for that test. If you do, great, it’ll pass you your object previously created under that context. If not, it’ll create you one, of the type that you pass into the base class. It can differentiate between tests thanks to NUnit’s own TestContext .

This looks much cleaner, and again, we’re preventing leaks and influences from other tests. Each test has its own context now. Perfect.

As well as making sure each context is now new’d up per test (so no old dodgy data is left hanging around messing up our validation), we can also turn parallelism up to full. We aren’t sharing fields between tests now, so nothing is stopping us!

Our structure and setup is complete. We have an injection system in place, and all it takes is extending from a base class. Easy.

Now how do we actually write tests?

Tests require a

‘Given’ (setup steps)

‘When’ (the action to test)

‘Then’ (validation that the behaviour is as desired)

and a .BDTest() to kick it all off!

This looks like:

It’s pretty readable — even to someone who doesn’t code.

As long as you, the test writer, enforces a good naming convention, a stakeholder would be able to look at your tests and know what’s happening.

There are some rules around the GWT, but they’re beneficial, as they enforce a best practice. This means that tests have to be implemented properly and in a clean manner.

Tests must contain a Given + When + Then and executed with a BDTest Tests must start with a Given From a Given you can have And or When > From an And you can have And or When From a When you can have a Then (No And - we should be testing one action!) From a Then you can have And or BDTest > From an And you can have And or BDTest

We had a similar framework before, but you could use any of these, in any order, and even miss bits out.

Given + Then + Given + Given + When + Then + When doesn’t really make sense though, does it?

So why would we want the option to potentially write our tests like that?

We don’t! These rules help to keep your tests clear and concise.

Reports

We’ve installed the ReportGenerator NuGet package (if not scroll up!), and we’ve run some tests. Now we want information.

As standard, these will be produced in your project’s output directory. Typically, this is:

..{project directory}\bin\Debug

etcoreapp2.*\

We should have two HTML reports, an XML file and a JSON file.

These are:

An HTML Results Report that organises scenarios by their story/feature

An HTML Results Report that lists all scenarios run, unorganised

A JSON dump of the raw test data — maybe your data analysis team can do clever things with these?

An XML dump of the raw test data — same as above

If we open the report, we’ll see things like ‘story not defined’. This is where we focus on clarity for stakeholders and business analysts. While we recommend enforcing clear method and class names, code doesn’t always convert perfectly to structurally sound statements.

There are recommended attributes to add to your project to aid with this.

[Story]

Annotate your test classes with a ‘story’ attribute. You will need to define three parameters; As a …, I want…, so that…

This looks like:

[Scenario]

Tests by default will try to use your method name as the ‘scenario’.

E.g. EnterEmptyTextAndClickSubmit() as the test method name will result in a scenario of ‘Enter Empty Text and Click Submit’

However, method names can’t and don’t always align with how we would describe it as a sentence. As such, annotate your test methods with the ScenarioText attribute to set a custom scenario title.

[StepText]

Each step action/method should also be described with the StepText attribute. This should be written without any Given When or Then keywords — this will be automatically inserted based on how they are inserted into the test. (See below)

[StepText] (with indexed parameter substitution)

We can also use index placeholders in our step text attributes — this will automatically substitute in the ToString of the parameter to the StepText.

Once ensuring our tests are thoroughly descriptive, after rerunning you should find reports that are clear to read, regardless of whether you know how to write code or not. They are displayed in simple, human language, and therefore can be cascaded to business analysts, stakeholders, or anyone else you can think of. The raw data dump files can even be given to data scientists for clever metric usage.

Persistent test data

BDTest also offers the benefit of persisting your results.

Why would you want to do this?

The benefit here is that this would allow us to compare different and historical test runs.

Out of the box, we have a test run-time comparison, as well as a flaky test report. However, you can also use the persisted data dumps of all your runs to perform your own custom reports. Just parse the JSON files and use the data however you like.

In order to set your persisted results, simply set the property held in BDTest.BDTestSettings (This can be accessed statically.)

If you set this, and run your tests multiple times, you’ll notice data dumps being output to this directory. Do with these what you like, or simply just enjoy the extra test time and flaky reports that, as mentioned, come out of the box with BDTest.

There’s also some other options in BDTestSettings , such as:

Pruning data dump files older than a DateTime

Pruning data dump files if you exceed a file-count limit

Customising the filenames of the reports produced

What problems have we solved?

Context leaks

We’ve removed any shared fields. Dependencies are injected into tests dynamically. They’re now unable to leak in to and influence the data of other tests. This means tests are truly standalone. Our tests since switching over are more stable, more reliable, and less flaky. They are consistent as the data in our contexts we know is truly independent from the data within any other test.

Parallel tests

As above, because of our test dependency injection and because everything is able to run standalone, we can turn on full parallelism. This will result in faster testing and faster bug finding and results. We’ve maximised our testing speed — now we’re only limited by our hardware. Using NUnit, I’ve found the speed to be increased by a multiplier of how many cores your machine’s processor has. For our machines, we had eight cores, and so our tests would run approximately 8x faster!

Test flakiness

We’re able to see which tests are flaky, so that we can better investigate, fix and maintain these tests so that they’re stable and consistent.

Test times

Similar to above, we’re able to see which tests are taking too long, or which vary greatly in execution time. We can then investigate and optimise them.

Stakeholder reporting

We’ve translated code into proper structured language. We can cascade these reports to stakeholders, which clarify what actions have been taken, what was being tested and what the result was.

Life after BDTest

In Saved Items at ASOS we now utilise BDTest in a number of our projects. We’ve automated Acceptance Tests, Component Tests and Integration Tests that all utilise the benefits of BDTest and follow the best practises set out.

Our path to live is cleaner and easier, as our tests are much more consistent. If we see a flaky test, upon some investigation, we may find that it’s a real issue, whereas before it might just have been passed off with the excuse ‘the tests are flaky.’

Before the tests served little purpose as they weren’t trustworthy. Now, we trust them, and they do improve quality.

Our automated pipelines are also fast. Releasing a change is quicker thanks to the true parallelism of our tests. If we have an issue, we wan’t to fail fast. If we don’t, we want to deploy fast.

We also have reports. Send the status to whomever, and we have a set of steps that looks like someone has sat down and written them.

So, if you like what you have read, give it a try!