By Nathaniel Kangpan, SVP Technology & Data Services, Kepler Group

I have yet to meet a client in the past 12 months that doesn’t already (a) have a team or two dipping their toes into cloud-based infrastructure, or (b) have the cloud on their roadmap for 2018. Total cost of ownership (TCO) savings for migrating from hardware to the cloud are far too convincing to be ignored.

But what should you really expect when you move from owned hardware to a cloud-based infrastructure like AWS? Should you just replicate your solutions on virtual servers like Amazon EC2, or should you use more managed services like Amazon RDS?

At Kepler Group, more than 95 percent of our infrastructure has been on AWS since late 2014. Over the past few years, we’ve advised numerous clients on what to expect when they make the switch themselves. We provide marketing database management services. One of the most common discussions we have with clients is helping them understand the benefits they’ll realize and the challenges they’ll likely encounter when they move their relational databases to AWS.

A Global Fortune 100 example

We typically frame conversations about database cloud migrations around a representative project that we completed for a Global Fortune 100 client. For this particular client, we first built a MySQL database on owned hardware in a collocated data center. We then moved it to an EC2 instance running MySQL, and finally ended up on Amazon Aurora with MySQL compatibility. The client’s use cases and underlying data schemas didn’t change much during our relationship. So we learned quite a lot about the true pros and cons of running the same MySQL database on multiple frameworks as we sought to become increasingly efficient in managing the solution.

The database in question was a centralized marketing and sales customer relationship management (CRM) database. It continuously aggregated data across multiple third-party sources for reporting and analytics use cases in a series of email and sales team-based marketing campaigns. In addition to managing the database, our team would also be its primary user, responsible for providing reporting and analytics as a managed service.

This project was on the smaller side of what we typically manage in terms of scope and budget. So in addition to meeting the client’s needs, we needed to pay close attention to the following:

Keeping database maintenance effort low

Limiting infrastructure costs

Ensuring that we had a reliable backup and recovery process

As mentioned earlier, we went through three different infrastructure solutions for this database and learned quite a bit about the benefits and challenges with each version:

v1.0: Linux on a hardware box running MySQL

v2.0: Linux on Amazon EC2 running MySQL

v3.0: Amazon Aurora with MySQL compatibility

The following migration summary explains the decision to go with each version and the benefits and challenges we realized through each iteration.

Version 1.0: Linux on a hardware box running MySQL

When we started our relationship with this client in late 2013, we had started looking into cloud-based services, but our infrastructure was predominantly a hardware solution based in a collocated data center. As many who are working in client services or under tight deadlines can appreciate, we needed to prioritize getting something up and running quickly, rather than building the ideal long-term solution out of the gate. We decided to start the relationship with a Linux-in-a-metal-box-running-MySQL combination because that’s what the engineers working on this project were most comfortable with.

Benefits

Looking back, the only real benefit to this initial approach was the familiarity our engineers had with working on a hardware + Linux + MySQL stack. The required development frameworks, data transfer mechanisms, and so on, were all fairly well understood, and we knew not to expect any major technical surprises. This gave us confidence that we could hit our client’s deadlines with minimal risk to timelines and budgets, in contrast to the risks of jumping into a cloud-based solution with our then limited AWS experience.

Challenges

However, the cons to maintaining a solution in a hardware environment were fairly numerous. We didn’t fully appreciate these inefficiencies until we made our migration to AWS later on. Specifically, we faced the following challenges with a hardware solution relative to cloud:

A fairly high server and database maintenance and upgrade effort

A less-than-seamless vertical scaling process to accommodate growing data volumes over time

The need to implement a custom backup process that also required more manual recovery processes whenever we needed to restore a snapshot

The ever-persistent, though unlikely, risk that the hardware could fail at any time

Today, the downsides of hardware-based solutions compared with mature cloud-based solutions are fairly well understood, but they were still a bit vague in early 2014.

Version 2.0: Linux on Amazon EC2 running MySQL

By mid-2014, we reached a point where we needed to vertically scale our entire data center in a substantial way to accommodate an exponentially growing set of data ingestions and analytics use cases for several clients. We conducted some straightforward TCO analyses, using AWS calculators to benchmark the cloud against hardware solutions. We found that we could realize substantial and immediate savings by making a large-scale migration to AWS, rather than continuing to grow our hardware infrastructure.

After some lengthy debates among the engineering team, we concluded that we could not cost-effectively support both a physical data center solution and a cloud platform. If we were going to move to the cloud, we would need to go all in across all our services.

We wanted to limit the number of changes we needed to undergo all at once. So instead of jumping straight into a managed database solution like Aurora, we moved our MySQL databases for current clients onto EC2 instances with a Linux-based OS running the same version of MySQL that we were running in our data center.

Benefits

We realized two immediate benefits from this migration:

First, our infrastructure costs went down substantially compared to our hardware-based data center. This was after accounting for collocation fees, additional hardware costs, and all the other incremental costs that come with running hardware.

Second, vertically scaling our solution to accommodate an ever-growing series of third-party data sources became truly seamless. We didn’t need to physically go to a data center, add another box, format and provision the new server, and so on. We could just type a few commands, and instantaneously we’d get an incremental 10–100 GB of scale.

There were a couple of other stated benefits for Amazon EC2 that we didn’t realize or gain from our migration for this particular client. To be fair, this was likely because they didn’t quite apply to our use case. For example, we didn’t necessarily need additional redundancy gained from having multiple Availability Zones. For those unfamiliar, an AWS Region includes multiple Availability Zones, and each Availability Zone is a separate facility. If one Availability Zone fails, we could in theory deploy to another fairly rapidly. However, for our use case, we did not need a solution with 99.99 percent uptime and availability. There were only 2–3 primary users of the solution that we were building. Reports and analyses were not generally required on demand and could be completed on an ad hoc basis within a 24–48 hour window.

Challenges

We faced two broad categories of challenges with this solution:

Solution onboarding: We had a number of one-time learning curve issues that we had to adapt to, given this was our first big migration onto a cloud service. Everyone in our engineering team would now agree that it’s dramatically easier to do things like procure servers, manage networking, and back up data through AWS than it was on our legacy hardware stack. However, it still took a few revs to learn how to do these things seamlessly. Had we done this for the first time under a tight deadline to create a solution for a client, we would have experienced a lot of incremental stress without the help of an experienced AWS engineer or consultant.

We had a number of one-time learning curve issues that we had to adapt to, given this was our first big migration onto a cloud service. Everyone in our engineering team would now agree that it’s dramatically easier to do things like procure servers, manage networking, and back up data through AWS than it was on our legacy hardware stack. However, it still took a few revs to learn how to do these things seamlessly. Had we done this for the first time under a tight deadline to create a solution for a client, we would have experienced a lot of incremental stress without the help of an experienced AWS engineer or consultant. Server and database maintenance: With the Amazon EC2 setup, we still had to patch the server and manually perform upgrades on the database to stay in line with the internal operating procedures and service level agreements we had with the client. The time we spent maintaining the database, outside of adding more compute and storage resources, was not dramatically different from the hardware-based solution we ran previously.

Version 3.0: Aurora

Throughout the remainder of 2015, we introduced a number of new data partners for our client. A few in particular were guilty of sending us corrupted and unclean data that would somehow make it past our data quality checks. As I’m sure many who work with third-party data providers have experienced, the vendor was proficient at finding creative ways to introduce new exceptions that would override how we had previously handled an issue. This occasionally caused integrity problems within the broader database that we had built for the client, and we’d have to manually restore from a point prior to the ingestion. As such, we found ourselves spending far more time than we liked maintaining backup and point-in-time snapshot processes.

Running MySQL on an EC2 instance had been a great entry point to understanding how core cloud services worked in a way that was accessible to our existing team’s knowledge. But by early 2016, we figured it was time to take a further step into AWS infrastructure and move this particular database into Amazon Aurora. We could then take advantage of its built-in features for maintenance, including updates, upgrades, and backup/restore processes.

Benefits

In hindsight, we should have gone straight to Aurora for our use case. Moving to Aurora removed most of the remaining inefficiencies we had in maintaining our solution. It dramatically freed our engineers and DevOps teams from the manual components of point-in-time restore processing. It was also fairly seamless to migrate from our Amazon EC2 solution to Aurora because our engineers had built up familiarity with the general AWS stack, and data was already stored on Amazon S3.

Compared to the hardware-based solution we started out with, Aurora gave us the following:

Significantly lower infrastructure costs: Capex (capital expenditure) resource costs are now down by 50–60 percent.

Capex (capital expenditure) resource costs are now down by 50–60 percent. Dramatically reduced maintenance needs : We freed up our day-to-day database administration resourcing to support this system by about 80 percent, allowing our teams to work on building product value for our clients elsewhere.

: We freed up our day-to-day database administration resourcing to support this system by about 80 percent, allowing our teams to work on building product value for our clients elsewhere. Seamless vertical scaling to accommodate growing data needs: Adding capacity is now measured in minutes, not hours or days.

Adding capacity is now measured in minutes, not hours or days. A turnkey backup and restore process: This is now done with a few clicks on the AWS Management Console rather than in a lengthy, error-prone custom process.

Aurora works beautifully for our particular client’s use case. This is now our default solution whenever we need to build a relational database running MySQL for smaller-scale reporting and analytics projects.

Challenges

So, what are the downsides? When we first completed the migration in mid-2016, it was unclear how to get regular access to the server and usage logs. To get regular access to logs, we would have had to set up a custom job like an AWS Lambda process to regularly pull and ship them out. This issue appears to have been partially addressed with recent updates that allow MySQL to pipe logs directly into Amazon CloudWatch, a cloud monitoring service for AWS resources. However, analytic database products like Amazon Redshift seem to have log access built in.

Although Aurora worked well for this use case, it’s important to remember that no solution is a fit for all problems. We’ve considered Aurora in a number of instances where it was unclear whether the advantages would outweigh the costs of upfront solution development and ongoing maintenance. For example, to support a consumer-facing solution that required multi-Region support and thousands of concurrent writes, we’d likely have to customize a solution built with SSD storage options and create multiple instances, shard them, and so on. This would probably take a substantial amount of upfront effort, and the ongoing maintenance could become complex.

I’ll add a caveat by saying we haven’t built this kind of solution using Aurora, so our assumption of inefficiency here is based on partially informed opinion. That said, AWS has announced an upcoming feature named Amazon Aurora Multi-Master, which could be a good fit for this kind of use case.

We’ll never use hardware again

It’s been almost four years since we first completed version 1.0 of this solution for our client, and three years since we first migrated the database to AWS. Since then, we’ve finished dozens of database-related projects on AWS for various clients across nearly all the different services and frameworks offered, from Amazon Redshift to Amazon DynamoDB to Aurora.

After the initial learning curve of moving to the cloud during the first few projects, I’ve become one of the biggest advocates for AWS among our client’s tech teams. For managed solutions like Aurora, the benefits accrue beyond just cost savings. Our engineers and DevOps teams spend anywhere from 60 to 80 percent less time on maintenance, upgrades, and recovery processes—freeing them up to spend more time creating.