For the past few years, I have heard many people advocating using only automatic tests. For example, if all the automatic tests pass, then the code should automatically be deployed to production. I have always performed a bit of manual testing before feeling confident about my code. So for the past year I have paid extra attention to bugs I have found manually testing my own code. My conclusion: manual testing is still needed.

How I Test

At work, we have a production system and a test system. Each developer can also run the system on their own machines. When I develop a new feature, I write both unit tests and automatic integration tests. Typically, my unit tests will concentrate on making sure the algorithms are correct. For example, to calculate a margin call amount, there are many in-parameters, such as exposure amount, collateral amount and thresholds. The unit tests make sure the correct margin call amount is calculated for all the different combinations of input ranges. The integration tests use more of the infrastructure, such as the web server and the database. These tests make sure the correct amount is returned when the API is called. They test a larger chain of usage, to see that all the pieces fit together. All the details about the calculation are covered by unit tests, and the integration tests just make sure that the calculation works in its intended context.

When I am done implementing a new feature, I will have both unit tests and automatic integration tests that have to pass. However, before I release the feature, I like to do some manual testing too. I start the complete system, with all the parts, including the GUI, on my machine. Then I try out the new feature in this context. I typically exercise the major parts of the feature, and check that everything is working as expected. I keep my eyes open for anything unusual or surprising. I also check the logs, to see that everything is as I expect. If I see anything weird or unusual, I dig deeper. Once this step is done, I consider my work on the feature finished.

Up until a few years ago, I just called this manual testing. But then I read Explore It! by Elisabeth Hendrickson, and I now think exploratory testing is a good description of what I do. The book starts by making a distinction between checking and exploring. Checking is when you know how the program should behave in a given circumstance, and you verify that it does. This is best done with automated tests. Exploring on the other hand is much more fluid. You try things, and let what you observe guide what you do next, while keeping your eyes open for potential bugs.

Why Manual Testing Is Needed

So, is exploratory testing necessary? I have been thinking about this on and off for the past year. There are two main reasons why I think it is useful and necessary. One argument is more philosophical, and the other is based on my experience.

1. Philosophical

I think the distinction between checking and exploring is very important. If you think automatic tests are enough, then you are essentially saying it is enough with checking, and that we don’t need exploring. In my mind, that is the same as saying that we can come up with all the test cases we need in advance. I don’t think that is possible.

This is similar to software development in general. It is virtually impossible to come up with the whole design before starting. Instead, you need to take an iterative approach, learning along the way, and incorporating the new learning when taking the next step developing the program. It is the same way coming up with all the tests needed without running the complete program. It is virtually impossible to think of all the cases.

The relation between exploratory testing and automatic integration tests is similar to the relation between integration tests and unit tests. The integration tests are needed to see that the logic tested in the unit tests fit into the bigger picture. The exploratory tests are done on the highest level – the complete system. These are needed to make sure that everything that has been tested automatically fits in and makes sense, not just by itself, but as a part of the whole.

So if you think automatic tests are enough, you are also saying that there is no need for exploratory testing. I think this is wrong – I think we need both checking and exploring, not just checking.

2. Experience

For the past year, I have paid extra attention to bugs I have found during manual testing. I have been trying to cover everything with automatic tests, but I have still found bugs when exploring the complete system. Similarly to how I keep track of interesting bugs, I have kept track of these instances of finding bugs that slipped through the checking, but were found when exploring. In total, it happened approximately once a month. Here are two examples:

1. String versus float. I added a feature that would stop certain margin calls based on inconsistent amounts. In one case this meant checking whether the call amount was greater than the collateral amount. It was straightforward to write automatic tests for the different scenarios. However, when trying one case through the GUI, I got a surprising result. It looked like the test “if 8 > 9” was true! When investigating, I found that the python code compared the string ‘8’ to the float 9. In python 2.7, this is not an error – instead it returns true! For historical reasons, the GUI at that time sent all values as strings. But in the back-end, numbers were stored as floats. This is why the type mismatch happened in the comparison, leading to the wrong result. In all my automatic tests, I had assumed floats in API-calls from the GUI, but that turned out to be incorrect.

2. What should not happen. Previously, the agreement currency and agreement type were set once on a position, and could not be changed after that. We wanted to make it possible to change the values if no margin call had been initiated. Again the code was quite easy to write and test. However, when testing on the complete system, I noticed that when I only changed the agreement currency, the agreement type got cleared to None. This was wrong. In my tests, I asserted that each change had the desired effect. However, I did not assert that other values did not change. But when trying this in the GUI, it was immediately obvious that more than one value was changed.

The point here isn’t that it is impossible to come up with automatic tests that catch these bugs. The point is that in these cases, I didn’t come up with them. You could equally argue that there should never be any bugs in your code, because you could have thought about the cases beforehand, and handled them. In theory it is possible, but in practice bugs happen. The same is true for coming up with tests. The exploratory testing is relatively cheap way of catching bugs that are hard to catch by automatic tests (because you failed to think about the cases). In my experience, the bugs that are hard to consider for test cases often become obvious when using the system as a whole.

I think one reason for these bugs is that the automatic tests are focused on the code, whereas the exploratory tests are focused on the behavior of the system. Many of the bugs I found at the manual testing stage were caused by assumptions I had made that were not true. I also noticed that often the manual testing made me think of additional tests to run.

Elisabeth Hendrickson has the same experience. Early in Explore It! she quotes (and agrees with) a coworker that observed: “No matter how many tests we write, no matter how many cases we execute, we always find the most serious bugs when we go off the script“. This is my experience too.

Notes

Doing manual testing doesn’t say anything about whether to use a staging or local system or not. It is possible to the manual testing in production. If you use feature flags, you can deploy new features to production, but restrict access to them until they have been manually tested.

You can also manually test and still automatically deploy everything that is merged to master to production. In that case you need to do the manual testing in the branch, and only merge to master when the new feature is both checked and explored.

Also note that manual testing doesn’t necessarily mean there is a GUI. You can do manual testing even if the only access to the system is via APIs. In that case, the tests consist of using the system as a whole as it is used by the users. You use the system to accomplish a task, which is often more than simply making an API call and getting a response.

Finally, doing some manual testing does not mean big, batched, infrequent releases. Manual testing works well with continuous delivery. There will however always be bugs slipping through. Manual testing is a different kind of testing compared to automatic tests, and it helps eliminating some bugs. For the bugs that still slip through, we must be quick to trouble shoot and fix.

Conclusion

My current view on testing is that manual testing is a valuable complement to automatic tests. My experience is that I regularly find severe bugs this way, and that the cost of doing this testing is low. I am interested in hearing other perspectives and experiences. How do you do testing? Do you agree or disagree with my conclusion? Let me know in the comments.