There is a popular belief that cloud vendors often overcommit on their capacity and put too many instances on physical hosts in order to improve their bottom line. I’ve decided to put this belief to the test with Google Cloud. The results are really surprising!

Working at DoiT International gives me the opportunity to meet customers from many geographies and verticals running various types of workloads on the public cloud.

While most of the customers are okay with running the instances on shared physical machines in Google Cloud, some are more picky about who they run with on the same node. Mostly, it’s due to regulatory reasons or workloads which do not tolerate noisy neighbours well.

Last month Google announced (Thursday, June 7, 2018) the beta availability of sole-tenant nodes.

In general, Sole-tenant nodes are physical Compute Engine servers designed for your dedicated use. Normally, VM instances run on physical hosts that may be shared by many customers. With sole-tenant nodes, you have the host all to yourself.

Let’s start

The core idea is to compare shared tenancy instances versus sole-tenancy instances, and try to find significant (or insignificant) differences in the way they perform.

The first and the most obvious test would be to check CPU performance because CPU capacity is “compressible” i.e. you can theoretically, as a cloud provider, throttle the CPU to put more instances on each physical node.

Testing Environment

I started with a fairly simple setup:

Single n1-standard-16 instance with 16 cores and 60 GB of memory Single instance on sole-tenancy node with 16 cores and 60 of memory

The setup of regular n1-standard-16 instance is pretty straightforward:

gcloud compute instances create sh-vm1 \

--zone=us-central1-f \

--machine-type=n1-standard-16 \

--min-cpu-platform=Intel\ Skylake \

--image=debian-9-stretch-v20180611 \

--image-project=debian-cloud

Note: the reason I am requesting the Intel Skylake processor with “min-cpu-platform” is due to the fact that sole-tenancy nodes have this type of the CPU.

Setup of sole-tenancy instance has three steps:

Creating a node template:

gcloud beta compute sole-tenancy node-templates \

create y-nodetemplate \

--node-type n1-node-96–624 \

--region us-central1

Provisioning node group:

gcloud beta compute sole-tenancy node-groups create y-nodegroup \

--node-template y-nodetemplate \

--target-size 1 \

--zone us-central1-f

And finally create the instance itself:

gcloud beta compute instances create st-vm1 \\

--node-group y-nodegroup \

--custom-cpu 16 \

--custom-memory 60 \

--zone us-central1-f \

--image-family debian-9 \

--image-project debian-cloud

The CoreMark Benchmark

To test the CPU, I have decided to use Coremark since it’s a well known benchmark developed in 2009 by Shay Gal-On at EEMBC.

Let’s compile it to match our test environment:

make PORT_DIR=linux64 ITERATIONS=200000 XCFLAGS="-g -DMULTITHREAD=16 -DUSE_PTHREAD -DPERFORMANCE_RUN=1 -pthread"

Now that we have the instances and tools deployed, let’s start the tests!

Test 1: Comparing CPU performance

To benchmark the CPU, I’ve run the Coremark test 100 times on each shared tenancy instance as well as on the sole-tenancy instance. The benchmark is measured by a number of iterations on each run.

sh-vm1 is shared tenancy instance vs. st-vm1 which is sole-tenancy instance

There is a slight advantage for the sole-tenancy instance but it’s not that dramatic (less than 3% on average). Another observation is that there are no significant drops in the performance of shared tenancy instances due to “noisy neighbours”.

Test 2: Putting more instances to work

This time I wanted to see the mutual interference of several instances running on the same hardware. Again, I ran Coremark 100 times on each of the instances at the same time.

sh-vm[1–6] are instances running on sole-tenancy physical machine

From the graph, it looks like there is no real mutual interference between the instances, although they are running on the same physical machine.

Overall, it looks like there is a small (though insignificant) advantage for the sole-tenancy instances, at least in terms of the CPU performance. What about I/O? Let’s test the network too!

Network Performance

Here I am going to test network performance of sole-tenancy instances and compare it to the shared tenancy instances. For this task, I’ve chosen Iperf as the benchmark tool.

sh-vm1 is the shared tenant instance st-vm1 is the sole-tenant instance st-vm2 is the Iperf server

Both sh-vm1 & st-vm1 ran Iperf -c st-vm2 with a total of 10 iterations. Each instance ran the test alone. Here comes a big surprise: as you can see in the chart below, things don’t quite look the way you’d expect.

It looks like the shared tenancy instance is getting better throughput. This leads me to the idea that sole-tenancy physical machine doesn’t really have 6x 16Gbit network interfaces.

To validate this assumption I made one more test:

Additional sole-tenancy machine with 6 instances: st-vm1s — st-vm6s. Ran Iperf server on all of the new 6 instances. Setup an internal load-balancer on top of these instances with no affinity.

Overall, my setup is:

sh-vm1 as the shared tenancy instance st-vm1 as the sole-tenancy instance Load Balancer IP address as the Iperf server

Network throughput in Gbits/sec

Now things start to make more sense. In most cases, the sole-tenancy instances have a small advantage but not something major.

All network tests were running at most for less than two minutes. What if I would run them for much longer? Perhaps, I will I be able to catch a “noisy neighbour”?

10 hours long network benchmark

Surprisingly, it looks like the shared tenancy instances have more stable bandwidth behavior, i.e. they had less variance in throughput during the test.

Disk Performance

As everyone is well aware, disk throughput on public cloud instances is usually tightly bound to the network performance; however I was still curious for a better understanding of the potential gaps (if they exist) between these two factors.

So, how the disk performance on sole-tenancy instances compare to shared tenancy instances?

To test just this, I have attached a new Persistent SSD 200GB disk to each of the instances. To eliminate the Network egress caps on write throughput, I have also increased the number of cores to 32 on each of the instances.

To test the IOPS and throughput of the disk, I have used fio with the following configuration:

#!/bin/bash block_dev=/$mount/$point # install dependencies

sudo apt-get -y update

sudo apt-get install -y fio # full write pass

sudo fio --name=writefile --size=10G --filesize=10G \

--filename=$block_dev --bs=1M --nrfiles=1 \

--direct=1 --sync=0 --randrepeat=0 --rw=write \

--refill_buffers --end_fsync=1 \

--iodepth=200 --ioengine=libaio # rand read

sudo fio --time_based --name=benchmark --size=10G --runtime=30 \

--filename=$block_dev --ioengine=libaio --randrepeat=0 \

--iodepth=128 --direct=1 --invalidate=1 --verify=0 \

--verify_fatal=0 --numjobs=4 --rw=randread --blocksize=4k \

--group_reporting # rand write

sudo fio --time_based --name=benchmark --size=10G --runtime=30 \

--filename=$block_dev --ioengine=libaio --randrepeat=0 \

--iodepth=128 --direct=1 --invalidate=1 --verify=0 \

--verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k \

--group_reporting

Full write pass

+-------------------+--------+--------+

| | Shared | Sole |

+-------------------+--------+--------+

| IOPS | 96 | 96 |

| Throughput (MB/s) | 96.931 | 96.931 |

+-------------------+--------+--------+

Random read

+-------------------+--------+--------+

| | Shared | Sole |

+-------------------+--------+--------+

| IOPS | 8489 | 8489 |

| Throughput (MB/s) | 33.162 | 33.162 |

+-------------------+--------+--------+

Random write

+-------------------+--------+--------+

| | Shared | Sole |

+-------------------+--------+--------+

| IOPS | 6109 | 6110 |

| Throughput (MB/s) | 23.864 | 23.869 |

+-------------------+--------+--------+

Looks like there is no difference in disk performance when using sole-tenancy instance versus shared tenancy instances.

Final Conclusions

While the sole-tenancy instances may appeal to the customers in highly regulated verticals, at the end of the day it looks like Google is not overselling its hardware (now I have tests to prove what I claim to my students 🤓 during the Google Cloud classes I teach).

Unless you have regulatory limitations that require your workload to run on its sole-tenancy nodes, on Google’s cloud there is no real reason not to run on the shared tenancy instances, especially taking into consideration the fact that they are cheaper…