Changes in source code are an inevitable step of software development. Almost every living application evolves. As you crunch domain knowledge and translate it to a source code, you change an internal model of your application, sometimes significantly. It is not a problem in itself, but it might become an issue if this model is persisted and you would need to obtain it from storage in a future version of the application. Unfortunately, you never know upfront which version of the application would fetch this data.

Data migration problem

In this article, I would like to focus on the data migration problem, assuming that data is owned and managed by the frontend application only. It means that only the application executed on a user’s computer decides what to persist, what the structure is and how to interpret data obtained from the storage. The storage is unaware of the data’s meaning. It simply provides a persistence mechanism. It could be local storage of the web browser, a file on a user’s computer or some other storage system over which developers don’t have direct control.

Sample application with persistence

Let’s talk about a sample non-existent application, which allows the user to draw a diagram and save changes in JSON format for later use in the local storage of a modern web browser. Bear in mind that we can save the diagram today and load it 6 months later in a brand-new version of the application, for which the internal business model and persistence format have been changed.

A simplified JSON definition of such a diagram in a first version of the application might look like this:

The size of the diagram in pixels (width and height are the same). As you can imagine it may be insufficient, because most diagrams, that users can draw, are vertically or horizontally-oriented. It means, that we must store width and height as separate values in the document. Therefore, in a second version of the application, we decided to persist diagram data in a document structured like this:

The application lives, so it continuously evolves. We require, that users should be able to specify the background color of the diagram. To implement it we added yet another property to the JSON document. In a third version of the application, the persisted diagram definition could look this:

You might notice that each iteration changes something in the structure of the document. This persistence structure has to be reflected somehow in the business model of the application. In an ideal scenario both structures should be as similar as possible to avoid duality. The problem is that the shape of the internal business model can change very often due to new requirements or refactoring, which should lead to achieve a perfect and expandable model. The more you refactor, the more it gets out-of-sync with the persistence model. The internal business model of the diagram, after refactoring, could look like this:

Although it looks way better than the persistence model, we would have a duality of data structures.

How can we tackle the problem?

Let’s imagine that exactly the same model we want to persist in the application in version 4. Now, let’s consider how would we tackle the problem, if the user had access to the application in version 2 and wanted to load a diagram stored by the application in version 1. In order to fetch the data correctly, we could implement an adapter, which would translate version 1 to version 2.

If we had access to the application in version 3, how would we load a diagram stored by the application in version 2? We would have to implement an adapter, of course.

V2 is unaware of the background color, so we must set a default value, which is #ffffff in this case.

Finally, if we had access to the application in version 4, how would we load a diagram stored by the application in version 3? Here is the answer:

Now, having functions that implement transitions between consecutive versions, we are able to load diagrams stored in any format/version by the most recent application version. There is only one problem: detection of the diagram version. We need the version, because we have to call proper translating function to get the most recent format of data. So, how can we get the version? We can perform checks based on absence/existence of some properties. The simpler way is to save data together with its version in the same JSON document. This way, we can locate the proper transforming functions very easily and run them sequentially.

This is how the migration mechanism could look like:

And here we have a separate file which has the definition of all transitions:

The migrateIfNeeded function checks if the data provided requires migration to a newer format. If so, then it finds a proper function using the transitions object and calls it with current data. Then the same algorithm repeats until there are no more migration steps. It means that all migration steps have been executed and data has the most recent format.

The transitions object describes which function must be called in order to migrate data to a higher version.

The contract of the migration step is that it gets data in version X and then it returns data in version X+1. For better clarity each migration step can be defined in a separate file.

With this generic mechanism, we can obtain and translate data in any format, even the oldest one. We simply have to run the migration in a single part of the application. The best location is a function which loads the diagram:

Other parts of the source code are not even aware of the different storage format. It perfectly decouples the clean and consistent application model from persistence format details.

Summary

This very simple technique allows you to polish, extend and refactor the domain code. Forget about legacy code in the domain logic, which adds missing model properties or transforms values appearing in the old data model. Just move this responsibility to a tiny migration step function and clean your business logic code!