Migrating a Monolith to Google Kubernetes Engine (GKE) — Data migration

Get Cooking in Cloud

Authors: Priyanka Vergadia, Carter Morgan

Introduction

“Get Cooking in Cloud” is a blog and video series to help enterprises and developers build business solutions on Google Cloud. In this third miniseries we are covering Migrating a Monolith to Google Kubernetes Engine (GKE). Migrating a Monolith to microservices can be intimidating. Once you decide to take it on, what are things to consider? Keep reading…

In these articles, we will take you through the entire journey of migrating a monolith to microservices, the migration process, what to migrate first, the different stages of migration and how to deal with data migration. Our inspiration for these articles is this solutions article. We will top it all with a real customer story walking through those steps in a real world application.

Here are all the articles in this miniseries, for you to checkout.

In this article, we will discuss the four different methods of data migration. So, read on!

What you’ll learn

What questions to ask before you migrate data

Four approaches to migrate data from a monolith to a microservice application.

Prerequisites

Before you begin, it’ll be helpful to understand the following:

Basic concepts and constructs of Google Cloud so you can recognize the names of the products.

The previous videos in this Get Cooking in Cloud series.

Check out the video

Questions to ask before you begin

Eggs are a lot like data migrations strategies? Well, maybe not so much but one thing I know is that both eggs and data migration have multiple methods for accomplishing the task. Source: https://pixabay.com/vectors/egg-breakfast-yolk-protein-fry-575756/

There are many ways to cook an egg.

You can fry it, boil, scramble it, and more. Which approach you use depends on a variety of factors: How long do you want to take? What’s it being paired with? So on and so forth.

My preferred method is over hard, but that’s not what’s important here!

What’s important is that when it comes time to migrate data, just like with eggs, there really is no single best approach. The strategy you use will depend on the answers to a few important questions:

How much data do you need to migrate?

How often does this data change?

Can you afford the downtime represented by a cut-over window — the time it takes to transition traffic from the old service to the new one — while migrating data?

What is your current data consistency model?

The answer to these questions will guide you towards the data migration recipe that’s perfect for your needs.

4 Data Migration Approaches

Today we will share four different data migration approaches, each of which tackles different issues, depending on the scale and the requirements of the data migration.

The data migration approach you use is based on:

the cut-over window size that your application can tolerate,

the refactoring effort that your team can handle

and the amount of flexibility each approach offers.

Let’s take a look at a few different strategies.

Scheduled maintenance

The scheduled maintenance period is ideal if your workloads can afford a cut-over window. It is scheduled in the sense that you can plan when the cut over window occurs. Here is how this will work:

Copy data from legacy site to new site, in order to minimize the cut-over window. (For more details refer to the earlier article, “Migrating in stages”.) Compare the legacy data against the new dataset, to make sure there weren’t any copy errors. Stop further data from being written to the system during the migration. Synchronize any changes that happened after the initial copy. Deploy new versions of the workloads and services that use the new site. Start the workloads and services. Retire the legacy site when you no longer need it as a fallback option.

This approach places most of the burden on the operations side, because minimal refactoring of the workload and system is needed.

Continuous replication

Not all workloads can afford a long cut-over window, so the Continuous Replication approach builds on the scheduled maintenance approach by providing a continuous replication mechanism after the initial copy and validation steps.

The steps for this:

Copy data from legacy site to new site, in order to minimize the cut-over window (just like in the scheduled maintenance approach). Then we Compare the legacy data against the new dataset, to make sure there weren’t any copy errors. Set up a continuous replication mechanism from the legacy site to the new site. Stop further data from being written to the system during the migration. Refactor workloads and services to use the new site. Wait for the replication to fully synchronize the new site with the legacy site. Start the workloads and services. Retire the legacy site when you no longer need it as a fallback option.

This approach is more complex than the scheduled maintenance approach. But it minimizes the time for the required cut-over window, by minimizing the amount of data you need to synchronize.

Y recipe

If your workloads have hard high-availability requirements and you can’t afford a cut-over window, a different approach — what we’re calling the Y recipe would make sense. In this recipe the workload is writing and reading data in both the legacy site and the new site during the migration.

Here is how this would look:

Refactor subsystem to write data both to the legacy and the new site, while reading from the legacy site. Copy data into the new system that was written before the new site began to write the data. Compare the legacy data against the new dataset, to make sure there weren’t any errors. Switch read operation from the legacy site to the new site. Perform another round of data validation and consistency to check data. Disable writing in the legacy site. Retire the legacy site when you no longer need it as a fallback option

Unlike the scheduled maintenance and continuous replication approaches, the Y, writing and reading recipe shifts most of the efforts from the operations side to the development side due to the multiple refactorings but it does not require a cut-over window which is a plus!

Centralized Y

Now, if you want to reduce the large amount of refactoring effort needed in the Y approach, you can centralize the data read and write operations by refactoring the workloads and services to use data-access microservice.

This scalable microservice becomes the only entry point to your data storage layer, and it acts as a proxy for that layer.

It’s actually pretty similar to the Y approach we discussed earlier, the only difference is, instead of refactoring the entire service, we focus the refactoring effort on the data-access microservice alone.

Refactor the data-access microservice to write data both in the legacy site and the new site. Reads are performed against the legacy site. Identify the data that was written before you enabled writes in the new site and copy it from the legacy site to the new site. Along with the refactoring above, this ensures that the data stores are aligned. Perform data validation and consistency checks comparing data in the legacy site against data in the new site. Refactor the data-access microservice to read from the new site. Perform another round of data validation and consistency checks comparing data in the legacy site against data in the new site. Refactor the data-access microservice to write only in the legacy site. When you no longer need the legacy site as a fallback option anymore, retire it.

In a microservice architecture, this recipe is the preferred method and gives you the maximum flexibility, as you can refactor this data-access component without impacting other components of the architecture and without requiring a cut-over window. It puts a burden on the development side but significantly less so than the Y approach discussed above.

Conclusion

Four happy chefs; four happy approaches to migrating data! Source: https://pixabay.com/illustrations/chef-character-cook-gourmet-1417239/

In summary, we looked at four recipes to migrate data with the understanding that picking the right option would depend on weighing the pros and cons per your situation.

Just like there are many ways to cook an egg, the data migration recipe you use depends on your app’s cut-over window, refactoring costs, and the amount of flexibility each approach offers. But for many microservice architectures, the centralized Y recipe will be the preferred method, as it maximizes flexibility and uptime.

If you are looking to migrate data from your monolith into a microservice architecture, you’ve got a small taste of the trade-offs involved. Stay tuned for more articles in the Get Cooking in Cloud series and checkout the references below for more details.

Next steps and references: