Building an infrastructure in the cloud is an increasingly popular choice these days. Low startup costs, flexibility, and ease of deploying to multiple regions are all compelling features for new ventures and established enterprises alike. As part of this trend, virtualizing MongoDB is increasingly common.

Databases present specific challenges to virtualization, which in many cases has led to poor performance, especially before the emergence of clear best practices.

As part of a migration to a cloud hosting environment, David Mytton, Founder and CTO of Server Density, did an investigation into the best ways to deploy MongoDB into two popular platforms, Amazon EC2, and Google Compute Engine.

In part one of this two part series, we will review David’s general pros and cons of virtualization, and in part two, we will cover the challenges and methods of virtualizing MongoDB on EC2 and GCE.

Introducing David Mytton and Server Density

David Mytton is the CTO of Server Density, which boldly proclaims it offers “server monitoring that doesn’t suck.” Their service provides remote or on-premises monitoring of infrastructure. Besides the standard metrics from servers, they can collect any custom metrics you want via custom plugins, and they inter-operate with Nagios plugins as well.

Server Density uses MongoDB to store all of its monitoring data. Every metric from every server for every client adds up to quite a bit of it! Each month Server Density ingests 250TB of monitoring data, inserting roughly a billion documents into MongoDB every day.

At his talk at MongoDB World 2014, David went into detail about why one would want to virtualize MongoDB, what considerations to have in mind while doing so, and the specifics of deploying MongoDB into both EC2 and Google Compute Engine.

Cloud Infrastructure vs. Bare Metal

David segments the overall trade-offs between bare metal and cloud VM providers into two categories: operational, and financial.

Operationally, the cloud offers ease of management and agility, while bare metal offers performance and the ability to purchase machines tailored exactly to your workload.

Financially, cloud infrastructure costs more over time but has very small startup costs, while co-location of bare metal requires capital expenditure, and eventual liquidation of inventory, but costs less in the long run.

That’s just the high level overview, though… let’s get into the weeds.

Virtualization: Advantages

Virtual infrastructures are easy to manage, and agile, because provisioning an instance is fast and simple.

With public cloud providers, one can take advantage of machine templates (AMIs in the parlance of EC2, or Images on the GCE side). The public images (such as the official MongoDB AMIs) are well vetted, and you can roll your own if you want to deploy the same custom image to lots of hosts.

Containment is easy with a cloud architecture, just deploy everything into its own VM.

Snapshotting is very easy with cloud providers. This provides two benefits:

- Fast backup - If an instance requires vertical scaling, it is easy to resize, or migrate. Just take a snapshot, provision a new volume, and restore to the new volume from the snapshot. With a large cloud provider, you have effectively unlimited resources to scale rapidly. If you need to add MongoDB replica set nodes, for example, or entire new shards, you can spin up instances and have them in the cluster within minutes.

It’s cheap to get started, even if you want to handle an unknown amount of load. You can spin up a lot of nodes without paying for physical hardware, and spin down what you don’t need when your load level is established.

The same flexibility means you can scale to handle seasonal traffic without being over-provisioned year-round.

With cloud providers, you can take advantage of other products they offer, such as DNS, email, storage, search, and load balancing.

Virtualization: Disadvantages

The hypervisor, which orchestrates virtualization, has overhead, and that affects performance.

VMs on the same host can experience contention for resources, especially in public clouds.

Databases such as MongoDB are particularly sensitive to IO latency, so this contention can lead to very poor performance if not accounted for.

Bare Metal: Advantages

With bare metal you get dedicated resources for all your apps, without the overhead of the hypervisor or contention between VMs.

You can completely customize your boxes, as opposed to having to use whatever configurations your provider offers.

Especially once you reach about 50 servers, even including the salaries or contracted cost of infrastructure expertise, owning your own hardware is much cheaper.

Bare Metal: Disadvantages

Unlike a Virtual server, which can be provisioned in minutes, provisioning bare metal will take at least 4 hours, and that’s assuming a good arrangement with a bare metal hosting service. It’s days to weeks if you’re ordering and racking in your own colo.

With bare metal, you must always be over provisioned to handle growth.

Snapshotting is hard, or at least harder. LVM offers relatively easy snapshots, but not as easy as a button-click, and managing the storage is up to you.

Resizing is hard. In fact, no-one would have called it “resizing” before virtualization, it was just called “getting a bigger box, migrating the app, and finding some hand-me-down use for the now unused old box.”

Investment! Bare metal requires CapEx, inventory depreciation, and eventually liquidation, or leasing, both of which have higher upfront costs than provisioning VMs.

A Typical Trajectory

Because of the trade-offs, a typical trajectory for a new enterprise is to start up their infrastructure purely in the cloud, and eventually to migrate to data centers of their own once revenue and/or investment is established and the benefits of scale emerge. That’s not the the only path, however. Sometimes operational concerns dominate, and businesses opt to stay with a cloud provider even after they reach a strict break-even point. And in some cases, businesses migrate from their own hardware to the cloud. This was the case with Server Density, and you can hear David discuss their rationale in detail in a video at the bottom of his post on the Server Density blog.

Stay tuned for the next installment, where we discuss the challenges of virtualizing databases in public clouds, as well as specific best practices for EC2 and GCE. In the meantime, download our operations white paper for best practices on deploying and managing a MongoDB cluster:

Avery is an infrastructure engineer, designer, and strategist with 20 years experience in every facet of internet technology and software development. As principal of Bringing Fire Consulting, he offers clients his expertise at the intersection of technology, business strategy, and product formulation. He earned a B.A in Computer Science from Brown University, where he specialized in systems and network programming, while also studying anthropology, fiction, cog sci, and semiotics. Avery got his start in internet technology in 1993, configuring apache and automating systems at Panix, the third-oldest ISP in the world. He has an obsession with getting to the heart of a problem, a flair for communication, and a devotion to providing delight to end users.

Read Part 2 >>