Two YEARS!

On the 6th of November 2012 I finally decided to open source an insane idea I had about 10 months earlier, an idea that had seemed so obvious, yet so impossible. To test the UI as a person would, by seeing. An automated visual regression test suite.

This, wasn’t a new idea; there had been many a boozy pub chat about how amazing it would be. But at that point in time I did not know of any working implementations. I found out later that it had been tried successfully, at least twice, but the idea had never become popular.

I remember thinking, “this will never work, it’ll be too unstable” – but it did (kinda, mostly) work.

Background

The previous three years, I had spent trying to improve and legitimize front-end testing at Huddle. We were relying on Selenium end-to-end scripts for testing UI for a long time, but they were incredibly slow and fragile. The biggest pain point for me was the tight coupling of XPath selectors to UI elements; namely, direct coupling to CSS class names, which would cause the tests to fail if I renamed .button to .btn, but continue to pass if I accidentally dropped the border-radius. After some fury I eventually found a way to mitigate test coupling.

So, we stopped using Selenium for pure UI testing (but kept it for end-to-end) and brought in Jasmine for unit testing, taking on a Mockist rather than Classical approach. I tried to test each module in isolation, by mocking it’s dependencies. The mocks had to implement the source modules methods, so If I change the method mod.raiseEvent() to mod.emit(), the tests would fail – for no good reason, until I changed the mock. That’s another form of implementation coupling, one that will turn a perfectly good test suite into an expensive code-change alert system. Cheaper change detection systems have been made.

But what if we could isolate the UI as a whole? Enter the headless browser, a test double and a navigation and testing utility. Reducing coupling by using XHR stubs and spies, and defining a testing contract with the UI markup was nothing short of a revolution for how we tested front-end. Uncoupled, isolated UI, unit tests, have an interesting but obvious side-effect, complete control of the UI in test. PhantomJS can take screenshots, from which I could see that the UI component under test always looks the same, because my test has complete control of the UI.

It always looks the same, unless it changes.

PhantomJS for CSS testing

Visual testing suddenly became cheap to implement and excitingly viable. The first cut of PhantomCSS didn’t produce diff images, it just reported the percentage difference on the command line; and that was perfectly OK. Initially we used a product called BeyondCompare to visually compare the before and after screenshots.

Visual testing with a team of between 2 and 8 people actively using the test suite presented some difficulties, the same difficulties we encountered with CI integration. The graphical and performance differences of different machines and operating systems cause these difficulties; false negatives, visual differences that aren’t bugs. The appearance of usually innocuous things like scroll bars and checkboxes will change on different systems. The way fonts are antialiased/smoothed, sometimes in greyscale, some times with colours. Then latency, even if the system under test is served from localhost; that one microsecond difference in image load time, on different machines can cause false negative test results. CSS animations as well; your colleagues machine is faster, has more RAM, your test results will not be the same when run on each others machines.

Most of these problems can be solved by ensuring that the test has sufficiently waited, and I’m not talking about thread.sleep(1000). I mean, wait for the element to appear, wait for the image to load. CasperJS provides lots of useful methods for waiting and asserting that elements exist, or don’t exist. For things like native scrollbars, and animations, hiding and stopping is the only way to go. PhantomCSS provides a method for switching off CSS and JQuery animations. Hiding or removing scrollbars is trivial with casper.evaluate.

ITS REALLY GOOD!

Whatever the problems, automated visual regression testing is an amazing enhancement and compliment to other UI testing techniques. I now have the freedom to aggressively refactor CSS in a large application. I can apply TDD-like practices to CSS component development. I now have more freedom to aggressively refactor JS because of implicit coverage of client-side templates. I now have a reference-able visual baseline of what has been signed off by the design and product teams. Did I mention the implicit coverage? Think about it, HTML, CSS and JS all in one visual test. I write FEWER tests, because I can focus the purely functional tests on concise breakage points, and have a visual test for everything else. Visual tests compliment functional tests, often a visual test failure will coincide with a functional test failure, helping me locate the change/bug quickly. It can also be used to easily bring rich test coverage to static websites, component libraries and live style guides.

Your automated UI tests shouldn’t cost you hours to maintain; they shouldn’t break when you rename a function or change a CSS classname; they shouldn’t take hours to return results. They shouldn’t stop you from doing your real job, building an awesome UI.

Perhaps it’s time to give automated visual regression testing a try?