Testing In Production The Mad Science Way:

Circuit Breakers & Science Experiments

By James Espie

Testing In Production

Deploying big changes to production can be scary. Even when you’ve done as much testing as you think you possibly can, you still may not feel comfortable with making changes in production. Some changes might have a lot of edge cases, or be very high risk.



There are many ways of mitigating this risk once you’ve deployed to production. Two ways of making major code changes in production safer are by using the models Circuit Breaker and a Science Experiment. These allow for real time risk mitigation of issues presented by new code to functionality which is critical for system operations or business functions. Let’s briefly look at how the two methods work, when to use them, and the challenges that they bring for testing.

Circuit Breaker Model

Evolution Of The Circuit Breaker Model

The circuit breaker model was popularised by Martin Fowler in 2014. It’s analogous to circuit breakers in electronics. In electronics, circuit breakers exist to prevent damage by interrupting the flow of electricity. Similarly, in development, a circuit breaker exists to interrupt the flow of data.

The initial concept, as you’ll see in Fowler’s blog post, was to prevent running out of resources when a remote call was continuously failing. After reaching a particular threshold of failures, the circuit breaker would trip, log a notification, and simply stop trying to make the remote call.

This circuit breaker model can be applied in different ways - it’s particularly useful when updating or replacing old functionality. Let’s dive in a bit deeper.

How The Circuit Breaker Model Works

The Circuit Breaker model requires a few things:

Two code paths the first code path is the new function or feature that we’re deploying the second code path is the pre-existing code - the old code that we’re intending to replace

A circuit breaker ‘flag’ - this determines which of the two code paths to take

If the flag is set to true, we use the new code that we’ve just deployed. We keep the second code path, the old code, in place as a fallback. This new code path is monitored closely. If it errors or fails for any reason, the circuit breaker is triggered, and the flag is flipped from true to false. Now that the flag is false, the new code no longer gets used, the feature should fall back to using the old code path.

Circuit Breaker - An Example

Let’s say we have a service (service X) that fetches prices for products. It’s a really important service because if it stops working, we can’t sell anything! Service “X” is being deprecated and it is being replaced with a new service (service Y). This new service, “Y”, does the same thing, but is more efficient and secure.

If we were to remove service X and replace it with service Y, it risks everything on service Y being faultless. If service Y fails for some reason, this could be costly! The circuit breaker method can protect us from this by keeping service X around as a fallback.

In the diagram above, we are using service Y, and monitoring its responses. While we are getting a ‘success’ response from service Y, it shows that the new code is clearly working and the system can keep using it.

When an error is returned from service Y, we know something is possibly wrong with the code. At this point, we can trip the circuit breaker, or the flag rather, and return the service to the “X” version of the code by setting the flag to false.

From here on the system can use the old service X. It’s less efficient and less secure than our new service, but we know it works.

This fast switch back to the old service X allows for an investigation into what broke, or caused the error, in service “Y”, and fix it. After the investigation and the resolution of the error, we can deploy an updated version of service Y, flip the flag back to true, and start using service Y again.

Valid Reasons To Use A Circuit Breaker Model

The lights will stay on if we can use the circuit breaker method to test new code safely in a production environment. The benefit of having the original working service means that users will not experience the errors caused by the new service for very long, having a fallback in place as a quick fix for the issue occurring in production.

An outdated, but functional service, is better than one that is completely unavailable.

Using the circuit breaker model we can learn about our product and how it behaves in production. We can then work to improve it, with our user base mostly unaware that anything went wrong.

Risks Using A Circuit Breaker Model

There are some caveats to be aware of when using the circuit breaker approach.

One thing to watch out for is when the circuit is broken by something other than the code - a network outage for example.

If the culprit can’t be identified easily, a developer might end up spending a lot of time trying to find a code problem that simply isn’t there. It’s important to have a good logging and monitoring strategy to mitigate this. Detailed error messages, HTTP error codes, and any other relevant logging should be included and tagged with some kind of correlation ID if possible.

Another thing to be careful with is the mechanism for resetting the circuit. It could be configured so that it only resets on a new deployment. Or there could be some tool that resets it, or it could even be reset by simply updating a config file. Whatever the method of resetting the circuit, be careful that it can’t be done accidentally. If the circuit is being turned on and off inadvertently, it makes it much harder to diagnose problems.

Science Experiment Model

There are some variations of the ‘science experiment’ model, including the Scientist tool from Github.

There are a few similarities to the circuit breaker model. Here are the requirements for the science experiment model:

Two code paths the new code we’re intending to deploy the pre-existing code that we’re replacing

A third function that compares the results from our old and new code

Contrary to the circuit breaker model, the new code path isn’t live yet. The system continues using the old code while the new code is run silently in the background. In this model, both paths are being exercised and evaluated via the comparator function created for the purpose of comparing and monitoring the two code paths.

We look for problems by comparing the output from the two code paths. If there is a difference between the baseline output from code A and output from code B then we may have found a problem.

This approach is much more about risk mitigation. Do code paths A and B both return the same result? If not - why not?

Science Experiment - An Example

Let’s say we have an algorithm that searches through a table of users and returns a single result that best matches the search criteria. However, the search algorithm is performing slowly. So, we have built a new version of the search algorithm to replace the poorly performing algorithm.

It’s a significant rewrite, and we want to be sure we have improved the speed without affecting the functionality. In other words, the search should still return the same result using the new algorithm as it would have with the old. (Let’s assume the performance improvement is a given - although we could use a science experiment to test for that, too.)

A science experiment can help us do this. Let’s call the old search function X and the new one Y.

When we run function X, we also silently run function Y in the background. Then we compare the result set of both searches. If they’re the same, great! If they’re different, then something about our new search function isn’t right. We log an error or alert describing the problem. Once we’ve found the issue in our new search, we can improve it, redeploy, and continue the experiment.

Reasons To Use The Science Experiment Model

The science experiment approach can add layers of confidence to functional testing. This is because the code you are testing is now being exercised by real users. By running the code silently and monitoring the results, you can catch any unexpected results ahead of time. If something unexpected occurs, then a developer can diagnose the issue. If they find that the new function is flawed, they can fix it, redeploy, and start the experiment again. Ultimately it means that if a bug is found, it can be corrected ahead of the function going live.

The experiment needs to remain on in production for a while, but exactly how long can depend on a couple of different factors.

One factor is the frequency of use of the feature. If the feature gets used frequently, then it’s going to get more results faster, and so can be run for a shorter period of time. If the feature is used infrequently, then it may be wise to leave the experiment running for longer.

Another factor is the risk involved with the feature. If it’s a high risk piece of work, then again, maybe the experiment should run for longer. If it’s less risky, the time can be reduced.

As a rough guide, a typical short experiment might run in production for a couple of days, a longer one could stay on for weeks, or even a month.

Risks Using The Science Experiment Model

Using the science experiment model also carries with it some risk. One risk is not getting enough results through. If you run an experiment for a while, but the feature only gets used by production users a few times, it may not have been a good use of time.

It’s important to do due diligence first to ensure that the feature is one that gets used frequently enough that it will provide enough data in production. Sometimes a feature only gets high usage on a particular day of the week or month. In these cases, it’s important to ensure that these high-usage days are included in the time period of the experiment.

It can also be tempting, when an error is struck, to assume the new code is at fault. That may not always be the case. It could be that there was a bug in the old code path that has been fixed by the new one. It’s helpful to keep this possibility in mind when examining errors that have been logged by the science experiment. This can save a development team hours of head scratching, looking for a problem that isn’t really there. If a bug in the old code path is uncovered, and you can determine the new code is correcting that bug, then the alert can safely be ignored.

Challenges When Testing In Production

There are some new challenges for using the circuit breaker and science experiment models in your software development life cycle.

Collaboration

Using either of these approaches requires significant collaboration with a developer. This is both a challenge and an opportunity.

If a tester wants to try one of these approaches, they need to be involved early in the development process. They can then work with the developer or development team, to determine exactly how it can be applied.

This could include:

Determining whether the approach will actually be workable. In an overly complex system, it may not be worth the effort.

Working out what sort of testability needs to be built in. This could come in the form of an admin page, or a console command, or some kind of script - or anything else.

Establishing how logging and alerting will be managed. How will the team be informed if something is triggered? An email, or a notification from a logging system, or something else?

Approval

These approaches can be expensive. They can significantly add to the amount of development and testing time needed, and this is a cost. As such, approval might be needed from a product owner or manager to do this kind of testing. They may need convincing that it’s worth the effort.

Considering these are methods that mitigate risk, in most cases, the risk is losing customers, or losing income. What is the extra cost of building this kind of tooling, versus the cost of it going wrong?

It could cost the company some extra development or testing time. However, it could also potentially save the company millions if it catches an issue. If that's the case, then it could well be worth investing in.

Testing Two Code Paths

Using the circuit breaker and science experiment models mean there are now two code paths. This means, before anything goes to production, both of these code paths need to be tested independently. It’s important to make sure the test code actually performs as intended. Using these two models stresses the importance of building testability into your product.

Build a simple way of switching between the two code paths for testing purposes prior to deploying the code using either model. A command line script or a feature flag might be a simple way to do this. It’s possible that pairing with a developer can help decide the best way to accomplish the necessary testing before moving the code to production environments.

In the case of the circuit breaker, this means testing that an error actually causes the flag to flip from true to false. With the science experiment, we need to test the function that compares the values. We need to be sure it tells us the truth when two results are different.

Thinking about testability comes into play. It might be helpful to build extra tooling. For example, one option for the circuit breaker might be a trigger that forces an error. For the science experiment, it could be useful to have a tool that can test the comparer function independently.

Alerts And Logging

The other thing that must be tested is the monitoring or alerting system. Both methods rely on there being an alerting system in place. A logging system such as Splunk or Sumologic can send alerts, for example.

It’s important to make sure that if something alarming happens, that it is logged. Also, the appropriate people need to be alerted! This could be through email or some other method.

It would be a terrible shame if something happened in production, but nobody knew about it!

When It’s Over

The production test shouldn’t be running forever. Once it’s been running for a suitable amount of time without throwing an error, remove the test code. A suitable amount of time, of course, depends entirely on the context of the software being built. Leaving unused code lying around adds to technical debt. This can make the code bloated and more confusing than necessary and cause future headaches for others which might want to run these kinds of tests in the future.



A good approach is to not consider the piece of work ‘done’ until the experiment is complete. This means the old code path has been removed from the codebase, and only the new one remains. If your team has a formal ‘definition of done’, clearing the technical debt should be part of this. This may mean the task remains ‘open’ for some time as the test remains active in production.

Reducing Risk & Gaining Robust Software

Either of these approaches will help reduce risk when making major software changes. The circuit breaker approach creates a safety net in case something goes wrong. The science experiment puts a feature through its paces, by evaluating real production usage. They are great tools for your development and testing toolkit. By using them when appropriate we can build more robust software. This is a win for both the team building the software, and the users using it.

Author Bio:

James is a software tester, currently helping to build world class giving and engagement solutions at Pushpay. He's a big believer in making life better for everyone - by building helpful, easy-to-use software; by helping other testers get better at their craft; or by taking his friends to the pub. When he's not doing the above, you can find him drawing pictures, running around in the forest, or playing video games with his wife and son. You can find him as @jamesespie on Twitter or @daily.pie on Instagram. He also hosts the SuperTestingBros podcast with his buddy Dan, which is definitely something you should listen to. Find them on Twitter as @SuperTestingBro.