Why Google’s Answer to AWS Reserved Instances is a Big Deal

9,316 reads

Why Google’s Answer to AWS Reserved Instances is a Big Deal

Update March 13: Two days later AWS responded to this move by relaxing rules about switching between instance types. However, this does nothing to alleviate restrictions around instance families and generations. You are still limited by network/disk/GPU/CPU characteristics of the instance family.

I’ve been watching discussions around Google’s freshly minted Committed Use Discounts program , and how it compares with Amazon’s Reserved Instances (RIs). The verdict is in — third parties like Rightscale have already done the math and showed Google to be at least 35% cheaper.

It’s easy to overlook the larger impact, and some press has already concluded that it’s an easy glitch to fix (just drop that 35% gap to 0%, you see…). Not so fast. Google Committed Use Discounts are much more than just a “pricing schema”. There are some serious practical benefits that have nothing to do with cost, and the other cloud vendors aren’t in a position to compete here technologically any time soon. In the end, all customers win!

Let’s discuss the what, the how, and the why…

Google Committed Use Discounts vs Amazon RIs

In short, Google now lets users pre-purchase chunks of CPU and RAM on 1- and 3-year commitments in return for substantial discounts, up to 57%. With Google you can create Custom VMs, picking your own CPU and RAM configuration. All instances get fantastic networking, and all instances can get top-notch disk and GPUs. So you are truly buying CPU and RAM, while retaining architectural flexibility. Users get to turn the IOPS/disk/network/GPUs knobs whenever they want, invariant of “instance family” or some other arbitrary (to us) limitations.

This is in sharp contrast to what Amazon offers with RIs — pre-purchasing instance types, which have specific characteristics like “nice network instances” or “GPU instances” or “great storage instances”. Thus with Amazon you’re pre-purchasing a pre-set configuration of CPU/RAM/IOPS/Network/GPU/Disk characteristics with only minimal flexibility (mostly around EBS and instance sizes), and your mobility to other pre-set configurations is severely limited. So you better be damn sure you made the right choice, because you’re living with it for 1–3 years.

Here’s another way to look at AWSinstances. A “compute optimized” instance is just another name for an instance with inferior disk, network, and a low amount of RAM. Why not have great everything!!

How how how is this even possible?

Google is able to offer this due to the unique nature of Google Cloud. Google Compute Engine under the hood is NOT a service that sells a bunch of VMs running on specific hardware. Compute Engine is an opinionated, living and breathing supercomputer, continuously carving out resources for its clients in the most optimal fashion (compare this to Microsoft’s perplexing claims in this space) . Complexity is abstracted away, and users are exposed to familiar IaaS primitives — VMs, networking, disk, etc.

Since 2013 Google’s been heavily leveraging Live Migration to help make these primitives as customer-friendly as possible (not to mention, to patch critical hypervisor flaws or to perform maintenance or remove noisy-neighbor problems). Goodbye maintenance windows!

Live Migration also lets us truly maximize performance, and to make that performance stable and predictable. Test us, I dare you! As far as I know, no other cloud has Live Migration, certainly not to the same degree.

Here are some more critical technicals of Google Cloud:

Google runs homogeneous hardware footprints, manufacturing and designing components that go into its data centers.

Google’s Jupiter network offers a Petabit of bisectional network bandwidth within each data center cell. Bandwidth does grow on trees, and a major reason why Google’s able to offer juicy services like BigQuery and Spanner.

Borg is Google’s orchestrator (and a predecessor to Kubernetes), spinning up workloads on-demand, bin-packing resources, performing rolling upgrades, and making sure performance is top-notch all around.

Google’s the only cloud provider running this way, and has been doing it so for the past 3.5+ years. The fact that no other competitor has emerged there in that long of a timeframe is indicative of just how damn hard the problem is, and how many technical barriers there are to creating this offer.

Benefits of Committed Use Discounts

Let’s quickly run through some of the benefits of Google Committed Use Discounts:

They’re quite inexpensive. I suspect this matters for clients, but don’t hold me to it.

You aren’t required to pay upfront to get the inexpensive price.

On Google, you aren’t stuck on an “instance family” and have to creatively compel your sales rep to get moved up to the newest generation. Sadly, your sales rep’s incentives are in direct conflict with yours here, as even Pivotal found out (go watch that video!).

You aren’t stuck on a CPU/RAM combination that may be inefficient or wasteful for you. Re-carve it using Custom VMs.

Your network/disk/IOPS/GPU knobs aren’t soldered shut. You retain 100% of the flexibility here. This is a big deal!

As Urs mentioned in his talk yesterday, you can retire your “Ministry of RI Optimizations”. Stop playing RI tetris, seriously — the sadomasochism is entirely optional.

How do you compete with this?

You can answer Google’s Committed Use Discounts by lowering prices or making RIs more user friendly and less restrictive, but in order to provide Google Committed Use Discounts, you need to do some serious engineering homework:

Get yourself a custom-manufactured data center stack with very few vendor dependencies

Make sure this stack is as homogeneous as possible

Build a Jupiter-like network that lets every instance in a data center talk to every other instance at 10G.. All at the same time..

Acquire Borg or a similarly-minder orchestrator, and heavily invest in this orchestrator for 10+ years

Create VMs that give you best-in-class disk AND network AND GPUs, invariant of “instance family” or type.

Productize Live Migration and give it 3+ years to mature.

Productize Custom Machine Types

Do not discount (no pun intended) the technical complexity here. These problems are very very hard. As Eric Schmidt has said, Google’s poured $30 billion dollars over the past three years on this bonfire, and it shows. In the end, users win!

So think of Google next time you’re trying to make sense of your cloud bill, or next time your sales rep calls you to re-up your commitment, or next time you’re trying to get a discount from your cloud vendor. You deserve the best!

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!

Tags