It seems like behavior-driven development (or BDD) and its supporting tools are gaining stronger momentum in Java world than ever. Cucumber, as one of the most popular frameworks used to support and automate BDD, seems like the go-to framework to power user acceptance tests (or UATs) without considering the implications of adoption of such a tool. Since using Cucumber was the requirement of one of the projects I have been working on I would like to share some observations I made and describe some of the pitfalls of such a decision.

Where are we now?

Before I present my observations I want to point out the frustration that is obvious from different sources all over the internet. To name just one, check out Testing like the TSA by David Heinemeier Hansson (the creator of Ruby on Rails) where he addresses certain red flags related to the process of testing and summarizes it all in his list of ‘Seven don’ts of testing’:

6. Don’t use Cucumber unless you live in the magic kingdom of non-programmers-writing-tests (and send me a bottle of fairy dust if you’re there!)

And this is just one of many people who were disappointed with the results of Cucumber adoption. But to be fair, there are some people that actually live in the magic kingdom and are happy with BDD and Cucumber. Just consider post from Lisa Clark on her kingdom titled Cucumber Goodness.

Where should we be?

All this goes to show that there is a way to get those mythical benefits of BDD but the adoption of Cucumber as UAT framework is simply far from enough. Lets stop for a second and consider the ideal scenario. What are the core principles of BDD and how does Cucumber fit into all of that. Lets start with a blog post by Aslak Hellesøy titled The world’s most misunderstood collaboration tool. He states following regarding the origins of Cucumber:

Cucumber was born out of the frustration with ambiguous requirements and misunderstandings between the people who order the software and those who deliver it.

He and his colleagues thought they figured out the way out. The idea was to combine automated acceptance tests, functional requirements and software documentation into one format that would be understandable by non-technical people as well as testing tools. However, many people misunderstood this idea or simply thought that there is a shortcut to success and it is enough to implement features in Cucumber and that’s it. Oh boy were they wrong!

Unfortunately you can’t just download Cucumber, start writing Cucumber Features and expect a nirvana of truth and enlightenment to happen on its own. There is a process to follow that involves many roles on the software team. This process is called BDD. It’s what came out of that clique I mentioned. BDD is not a tool you can download. Gojko Adzic gave BDD a new and better name: Specification by Example.

And this is in my opinion the biggest problem – organizations and teams adopt tools instead of processes without deeper understanding what the benefits are, what is necessary to successfully implement such a change and what are the pitfalls of such a decision.

This brings me to the ideal state – the magic kingdom. Just as Lisa and Aslak figured out, this process is not only for the programmers to write ‘their’ tests. They are merely a part of what goes on in BDD. Aslak outlines two core activities of this process:

Requirements definition This is usually the meeting where business analysts, programmers and testers sit down and discuss what features need to be implemented. The output from this meeting is set of examples of how these features must behave. And it is just a by-product that they are written down using Cucumber or any other framework. Cucumber is only supporting the process of BDD.

Outside-in development This is a practice that is well-known from TDD where failing tests drive what needs to be implemented next. As Aslak explains:

The technique is called Outside-In because programmers typically start with the functionality that is closest to the user (the user interface, which is on the outside of the system) and gradually work towards the guts of the system (business logic, persistence, messaging and so on) as they discover more of what needs to be implemented.



What are the pitfalls of Cucumber adoption?

This is the part where we need to ask some hard questions. Lets look at what are the pitfalls one must avoid to successfully implement BDD and use Cucumber to write down the specification. These are not in any particular order since I don’t want to imply that one is more important than the other. They all negatively impact the testing and the overall development process.

Are we just following the current trend?

Cucumber is a framework that is most often attributed as a great example of cargo cult programming. Good test you may try out to determine if your organization takes part in cargo cult programming is to ask the person responsible for Cucumber adoption question like ‘Why are we even considering this framework and approach to testing?’ If the answer to this question is ‘Because somebody/everybody does it.’ or some superficial bullshit this is a red flag. If the organization or the team is not ready to adopt the process, plain adoption of the supporting tool can make things even worse.

There is no guarantee that tool like Cucumber on its own will help developers gain deeper insights into the business domain or improve the communication between them, business analysts and testers. This is what the process needs to facilitate. Cucumber has very little to do in this area.

Is this really a good cultural/customer fit?

Another thing that is critical to the adoption is how both BDD and Cucumber fit into your culture and also the domain your clients do business in. Lets start at home. Since Cucumber is just a supporting tool, the organization needs to be ready to allocate proper resources to implement the process first and then move on to the Cucumber stuff along the way. However, since Cucumber uses Gherkin to capture the specification it might be troublesome for some programmers to express the test conditions in ‘English’ instead of more common ways of writing tests. There is a great post by Jon Archer titled How to completely fail at BDD where author describes his experience with pushing BDD in his organization. He goes into detail of how his team reacted and what challenges he faced along the way.

So your team and organization are ready for a full BDD adoption and you can’t wait to start with specification meetings and writing down those features? Well that is a good start but you also need to consider the customer and their needs. Can you imagine being a business owner and having to read through a bunch of Gherkin features? Especially when they vary in detail, scope and quality? I don’t think so. Some clients might require access to those feature and some of them might even benefit from this access. But before going down this road you should make sure that this deliverable is really necessary and that it also adds value to the quality of the process as well as the product itself.

What is the appropriate level of detail?

OK let’s move the focus on features themselves. After your team wrote at least couple of features you might notice certain differences in the amount of detail each scenario goes into. This is especially true when you only adopt the framework (without the process). With no communication involved your features will vary based on the author, their understanding of the feature at hand and even their attention to detail.

Most of the feature files I have seen go into way more detail than necessary. In my honest opinion, this leads to bloated features (and also step definitions) as well as to decreased value for both the team and the customer. One of the early adopters of BDD, Elizabeth Keogh, makes a good case considering the stakeholders point of view:

If your scenario starts with “When the user enters ‘Smurf’ into ‘Search’ text box…” then that’s far too low-level. However, even “When the user adds ‘Smurf’ to his basket, then goes to the checkout, then pays for the goods” is also too low-level. Think about the capabilities of your business. What does it allow your users to do? What value do the stakeholders get from it? How does it actually make money? You’re looking for something like, “When the user buys a Smurf.”

What should be tested?

Oh boy. I believe this is one the most valuable skills programmer can have. Being able to tell what should be tested and what doesn’t need to be tested. This coupled with the ability to determine what type of tests are required to test certain aspect of the behavior makes a great programmer. And this is one of the biggest issues I have encountered while working with Cucumber. The pressure to test almost everything on the user acceptance tests level completely ignoring the scope of UATs.

Failing to properly manage the scope of your tests and the purpose of each type of these tests often leads to slow and brittle features filled with mocks, complex workarounds and many unnecessary step definitions. In the extreme cases you might end up creating custom solution to every new aspect of the application under test making the framework hard to use and pain to work with.

Is there something to be reused?

Who knows. BDD puts a lot of focus on the communication and if the communication doesn’t work bad things start to happen. It is easy to overlook that Paul has already created the step definition you are currently working on. Or you simply chose a different name for the login page. Things like this happen. The lack of communication directly results in bloated step definitions and may cause differences in behavior when two seemingly same steps perform some extra operations.

I am going to speak from my personal experience now, but I believe that tests should be expressive and explicit which usually results in lower reuse potential (e.g. repeating yourself for the sake of readability). This combined with my preference of less detailed features results in less reuse compared to the more detailed, integration-style features usually starting with “Given I am signed as ‘user’ with ‘password’”. Remember that all the steps are always globally accessible. But whatever style you chose to implement, there will be reuse potential no matter what.

What are we doing here? Are these even acceptance tests?

Yes, this is also one of the questions you need to ask. Many developers focus on all possible aspects of the behavior instead of focusing on the value given feature brings to the customer. Fellow blogger and developer Jack Kinsella put it this way:

Despite believing otherwise, most programmers never wrote a single acceptance test; instead they wrote integration tests using the Cucumber syntax.

And I couldn’t agree more. Based on what I saw in the project I worked on as well as some of the open source projects on GitHub, the integration testing approach seems to be very common in their feature files.

Is Cucumber supported by our IDE?

Now let’s focus on how hard/easy it is to actually create feature files. Since I am a Java programmer I can only speak about the stuff I personally experienced. Having worked with Cucumber JVM on both IntelliJ IDEA and Eclipse I can definitely say that user experience varies. I can’t speak for other IDEs since I haven’t used them yet, but one thing is sure – there are various levels of support from IDEs. Make sure that the preferred IDE of your team supports Cucumber and Gherkin since the support for this language as well as the Cucumber framework really makes a huge difference.

How much does it cost us?

Once again, given the process is missing and all there is left is Cucumber, the process of testing this way is costly. This last point is influenced by almost all of the points above and also the degree of ‘going rouge with Cucumber’ without adhering to the BDD practices. Also the maintenance and re-factoring (in case of changes to the application) are a bit harder to do and you end up spending more time compared to other ways of testing. The last thing to consider here is the environment you work in. Things do change in an agile environment. And if you have already written the feature files for a features that are affected be an unexpected change you can end up throwing all that work out.

Final thoughts

And that is about as much as I can share at this point in my career. Despite many issues I have discussed here and experienced I believe that BDD and Cucumber can work. It is all about the people and their understanding of the process and the frameworks that support it. To be honest I would love to find myself living in the magic kingdom. However this haven’t happened so far, so I am still hoping :) .

There is a term widely used on the Internet for a situation that bares the symptoms I have described in this article – Falling into the Cucumber test trap. If you made it this far, please, make sure that you don’t repeat mistakes so many others have made before you. Falling into the Cucumber test trap is easy but recovering is hard and costly. Recognize the difference between the process and supporting frameworks and don’t seek the shortcuts.

Let me conclude this article by paraphrasing Tyler Durden (a character from David Finchers movie Fight Club) describing what happens when the framework becomes more important than the ideas that led to its creation: The features that should drive your implementation end up reflecting it.