I love refactoring. It is highly risky and challenging as I might break the production system and loose business. At the same time, it's a rewarding experience to improve the code which is less performing or buggy. For the past couple of months, I was working on a legacy codebase to introduce a number of new features and change the whole UI. Needless to say, this shouldn't break any existing functionality. We should also be able to enable the new features internally or for some users and others shouldn't notice anything.

So we followed a technique called Branch by Abstraction. This is a simple technique where we create an abstraction layer that will decide which code path to run based on the decision-making logic. This can be as simple as

function checkPermission($param) { if(feature_enabled('new_version')) { return improvedPersmissionCheck($param); } else { return legacyPermissionCheck($param); } }

In this way, we can run both versions of the code without breaking the production system. We can have unit tests to check if both code works as we expect. But how do we make sure both functions returns the same result all the time? There can be some edge cases where both functions behave differently. How can we tackle those cases?

Meet Scientist

Last month Github released a shiny Ruby library called Scientist, which will help us to tackle this problem. Dayle Rees ported this library to PHP, so that we can use this in our PHP applications also. Scientist provides a way to run both versions of the code and generates a report with some insights like, whether the results matches or not, time took to run each version, etc. We can use these insights to figure out the scenarios where the refactored code doesn't work as we expected. This library has been ported to different languages.

How it works?

Scientist works by creating an experiment around the code that we want to try. We need to create callbacks around the original code (Control) and refactored code (Trial) that we want to experiment. Scientist will run both functions and return the result of Control while generating a detailed report of its findings. Scientist doesn't know what to do with the report, so we need to create Journals to handle them.

Enough talk, let's see the code.

Imagine we have a Post model with some access control checks.

<?php namespace App\Models class Post extends Model { ... public function hasAccess($user) { if($user->role == 'admin') { return true; } return false; } ... }

Now we want to move the roles into a new table and add a new method in User model to check if the user has admin role.

<?php namespace App\Models; class Post extends Model { ... public function hasAccess($user) { if($user->hasRole('admin')) { return true; } return false; } ... }

As you can see, this is a minor but critical code change. If the tables are out of sync, it is possible that the new function may return a different result for the same user. We want to make sure this change doesn't allow any unauthorized users to access the posts. Let us see how we can use Scientist to experiment this change.

Installation

Before we do anything, we need to install Scientist package in our project. Using composer,

composer install daylerees/scientist

Using Scientist

In this case, we can convert the old and new code into two different functions. legacyAccessCheck has the existing logic which we trust (Control) and newAccessCheck has the new logic which we want to test (Trial).

public function legacyAccessCheck($user) { if($user->role == 'admin') { return true; } return false; } public function newAccessCheck($user) { if($user->hasRole('admin')) { return true; } return false; }

Then we convert hasAccess method to a proxy for these functions and use Scientist to carry out the experiment.

public function hasAccess($user) { $laboratory = new \Scientist\Laboratory; $experiment = $laboratory->experiment('experiment name') ->control([$this, 'legacyAccessCheck']) ->trial('first trial', [$this, 'newAccessCheck']); return $experiment->run($user); }

Let's discuss the code line by line.

$laboratory = new \Scientist\Laboratory;

All Scientific experiments need to be carried out inside a Laboratory. So first we have to create a Laboratory object, using which we can create as many experiments as we need. Laboratory also allows us to configure Journals to handle the experiment report.

$experiment = $laboratory->experiment('experiment name');

This will create a new experiment with the given name. Providing a name will help us in handling reports.

$experiment->control([$this, 'legacyAccessCheck']); $experiment->trial([$this, 'newAccessCheck']);

This will register our control and trial callbacks.

By default, Scientist runs the trial callback whenever it runs the experiment. Sometimes this might affect the performance, especially on high traffic servers. Optionally, we can specify a percentage chance of running an experiment using the chance method.

$experiment->chance(50);

Then we run the experiment.

return $experiment->run($user);

Like I mentioned earlier, it will run both control and trail functions, but will only return the result of the control function, the one that we trust.

Journals

So far so good, we created an experiment, ran both versions of our code and it returns the result of our old version. But what about the findings? We didn't tell our Scientist about what to do with the findings. We need to create Journals to handle the experiment report. Once we configure the Journals (Yes, we can have multiple Journals), Scientist sends the report to them. It is then the responsibility of Journals to decide what to do with the result, whether to save it in a data store or send to monitoring services.

Journals should implement Scientist\Journals\Journal interface, which should have a report method. For example.

<?php namespace App\Journals; use Scientist\Report; use Scientist\Experiment; use Scientist\Journals\Journal; class DatabaseJournal implements Journal { /** * Dispatch a report to storage. * * @param \Scientist\Experiment $experiment * @param \Scientist\Report $report * * @return mixed */ public function report(Experiment $experiment, Report $report) { $control = $report->getControl(); $trial = $report->getTrial('first trial'); // Store the report in database $data = [ 'name' => $experiment->getName(), 'params' => json_encode($experiment->getParams()), 'value' => $trial->getValue(), 'matches' => $trial->isMatch(), 'trial_memory' => $trial->getMemory(), 'control_memory' => $control->getMemory(), 'exception' => $trial->getException() // if any ]; ... } }

Once we create a Journal, we need to register it with the Laboratory.

$laboratory->addJournal( new DatabaseJournal );

Alternatively, we can register multiple Journals using setJournals method.

$laboratory->setJournals([ new DatabaseJournal, new RedisJournal ]);

Now the findings of our experiment will be passed to report method, where $experiment is the instance of the current experiment being run.

Exceptions

We don't want our users to see any exceptions that are thrown by our experimental code. Scientist suppresses any exceptions that are thrown by trials, but keeps them in the report. This way we can analyze the exceptions and fix them accordingly.

Custom Matchers

We can also define custom matchers to override the default matcher, which compares the results using "===". A matcher should implement Scientist\Matchers\Matcher interface, which should have match method to compare the result from both control and trial.

Caveats

It doesn't make much sense to use Scientist to experiment the code which has some side effects like creating file, update database, etc.

Obviously, there will be some performance overhead as we have to run two methods at the same time. Consider adjusting the chance to a less value minimize the overhead.

Summary

If you are refactoring an existing project, I would highly recomment using Scientist to experiment your changes. I would say Scientist is not just a library, but a pattern for refactoring legacy code. Scientist can help us to refactor and release our codebase with confidence while providing meaningful insights about the execution of different code paths.

We should also be mindful of the cost of running experiments in production. So instead of running the experiments for every request, we can configure it to run on only a small percentage.