To improve the stability of Fauxton we wrote a lot of Selenium tests to check for any regressions in the user experience. These tests have been incredibly useful in helping us make sure that Fauxton always does what it is supposed to do, and that there are no regressions with any commit. We run these tests on Travis CI for every pull request. They have also helped us find a few bugs in CouchDB.

However these tests have become harder and harder to maintain. Often the tests would pass locally on our machines but then would fail on Travis CI. They would often fail because the Firefox version that Selenium was testing against on Travis was different to our local machines. I personally could only get the tests to run on Chrome, so I had to ask Robert Kowalski to do my debugging for me — a big time-waster for him and a huge frustration for me.

Another issue was that the screen resolution could be different when testing locally and on Travis CI. This meant that sometimes things were not visible on Travis CI for the selenium test runner. So the test would fail on Travis CI but then would pass locally. This resulted in a lot of cursing and possibly excess drinking. All of this made me question whether we should even have these tests. I was very close to just giving up on them all.

Robert convinced me to give it one more try. But we needed to fix the setup. We needed a way to have a consistent testing experience on our development machines as well as on Travis CI. We needed to make testing as easy and frictionless as possible. This would hopefully fix all the inconsistencies we had and then make fixing these tests less time-consuming.

After some stackoverflowing and googling, I stumbled across a set of beautifully handcrafted Selenium Docker images made by the incredibly helpful Selenium team. It included images for running standalone versions of Chrome and Firefox or using nodes set up in a Selenium Grid.

The first step was to create a consistent environment that could be set up and pulled down quickly. I used Docker Compose and created a compose file that creates a Selenium server and a CouchDB 2.0 instance. We then ran a Fauxton server that the Docker container could interact with — I will try out running the Fauxton server in another Docker instance at some point. We use NightWatch.js to write our tests, so we have a command to run the full test suite or just one file. This sets up the basics for us so we can now run the tests in a consistent fashion on Travis CI and locally.

But how do we debug failed tests? The Selenium team have also built a debugging version that allows us to VNC into the container to see what is happening. Genius!

So we created another compose file to run the debug Selenium version for that. We can then open a VNC client and watch the test. We can interact with the browser and most importantly activate Firefox Dev Tools to see what is happening. Full-scale consistent testing everywhere!

We have been running the tests like this for the last few weeks and already it’s been easier to write tests and run them. It has actually made it fun to write NightWatch.js tests again — I never thought I would ever say that.

If you want to try this out for yourself with Fauxton, read Getting Started with Fauxton and Running the Tests here.

There is one issue we experienced with Selenium. The latest debugging images were failing due to an issue with the screen setup. There is a fix for it, hopefully it will land soon and we can upgrade to the latest and greatest version of Selenium. For now the easiest way around it is to run a slightly older version of Selenium.

Running our tests with Docker and Selenium has simplified our testing significantly and has been a much better experience. Now, instead of cursing, I find myself crying tears of joy as I consider the beauty of the tests running quietly in the background while I work on new functionality. I’m still drinking the same amount though… But seriously, it has allowed us to focus on building our features and making improvements to the codebase, rather than spending days fixing failing tests.