Recently I was working with another developer on refactoring one of the core functionalities of our backend. This required a near rewrite of all the main classes and methods in order to enable more versatile functionality. After some thorough testing, new bugs would pop up and we would need to revisit our code. This naturally caused us some stress about releasing the code to production without knowing if all bases were covered.

This “release stress” is common among developers, sometimes you feel more comfortable when you know there are bugs in your code than not. The ones that go unnoticed can be the most dangerous and cause the biggest heartache when you have a manager calling you at 3am because the website stopped working. There are always going to be bugs, this is why we test, why we create unit tests, and this is why we have processes to define how and when a release can happen.

In any developers anti-bug arsenal is one weapon that deserves mention. Unit tests are great but usually only cover the assumptions already made by developers. Production environments are a new ballpark, concurrent users with increasing load can cause your application to endure any number of states you did not account for. This is why all developers, when working on a huge sweeping change should have this strategy ready at his/her disposal.

Feature Toggle

The feature toggle is a technique we can use to isolate a piece of code and activate it through a database config or other means. They’ve been made popular by companies like Google and Facebook along with similar practices such as A/B testing. While they are generally used to activate new features in an application, feature toggles can also be used to activate structural changes to a code base. This application of toggles is less common.

In our backend refactoring, we realized the best means to push our changes live would be to use a feature toggle. The way we went about this was to create a new package, copy over the existing code and apply our refactoring there. In the layer above, where our code was called, we checked whether the new code was active or not. Doing this meant we could push incremental changes to our new code which wasn’t being used and once everything was ready we could activate the code by setting a flag in the database. Anything goes wrong and we simply switch back.

This is a fairly drastic measure and does of course create some code duplication. The idea is that once your new code has been stable for a period time without issue, the old code can be tossed and the the new code can drop the prefix “new” from its package name. The first problem is such procedures often get skipped or forgotten, even in well managed IT departments its easy to forget about such necessary tasks. After all this is really no different from commented out code which most people agree is terrible practice. However it is a more organized version, much like breaks and continues are more organized versions of goto statements.

Its important to make sure to use namespace or package names which clearly identify which is the old code and which is the new. That way even two years later, someone will still be able to identify the code which should have been discarded, so long as it is not in use. If the new code in the end doesn’t work, then the new code should be tossed and the old code should get its primary name back. I’ve worked in a company where we had some database tables in our data warehouse prefixed with “old” and “toBeDeleted” and they had carried those prefixes for more than a year. When investigating of course it had turned out these tables were still in use and no one knew why they were scheduled for deletion or what the replacement had been. Apparently what had happened was a developer had changed the table structure with his particular application in mind. This caused an unmaintained legacy system to break so the developer returned the table to its original state with said prefixes thinking the application would one day stop being used. Eventually he left for another job along with the knowledge of these tables and the legacy system. This is a case of rouge feature toggles and should be taken into consideration when creating a toggle.

Feature toggling in the manner that we’re suggesting does create overhead. If more than one team are in the same codebase it could be difficult to manage two identical pieces of code so communication is necessary. Depending on the severity of the change, I do think this overhead in the end is worth it. Particularly when making changes to a couple of classes which affect the entire application. It gives the developer a bit more freedom to refactor the classes without worrying about application downtime if something goes wrong.

The classes we refactored in our backend were a mess simply because every developer before had tried to make the smallest possible change and did not dare to clean up the code if it wasn’t absolutely necessary. The code was rotting and because of how essential it was no one wanted to touch it for fear of breaking something. In this case feature toggling finally gave us the freedom to do so. After we copied everything over and started to refactor we found deprecated class variables, unused methods, logging that no longer made sense etc. The bugs we did introduce to the system, which we later fixed, gave us a more intimate knowledge of the inner workings of the system which we could then create tests for.

As long as the limitations are taken into account and they’re not abused, feature toggles can be a valuable tool not just for features but code re-writes and refactoring jobs.