The Hybrid Cloud

You hear a lot about private clouds, public clouds, hybrid clouds, and multicloud. CIOs of enterprises are struggling to chart the right path in the cloud maze. Enterprises are being bombarded by vendors, cloud providers and integrators each offering their own approach to the cloud. A cloud could be inside a private data center, in a public data center, on a remote edge location, or on top of an oil rig in the middle of the ocean. What really matters is the consumption model that offers easy deployment of applications and resources, ease of use and maintenance, and secure access. The cloud should be a utility that an enterprise uses to collect, store, and access data to create value to its customers and employees, and thrive in a competitive environment.

It is clear by now that it is not an either or decision between private and public clouds. Enterprises need to build their private cloud and tap into the resources of the public cloud when needed. Debating whether to go private or public is like debating whether one should own a home or rent a room in a hotel. Private and Public have their use cases and will coexist, hence the notion of a hybrid cloud.

Having said that, many enterprises do not even have a private cloud yet. What they have is a data center filled with servers, switches, firewalls/VPN, storage controllers, disks and network management stations. The last decade and a half has been all about virtualization and making the most of the on-prem compute resources. We are approaching the point of saturation in server virtualization. If you go by the 80/20 rule, almost 80% of the servers out there have been virtualized. Still this does not qualify the data centers as private clouds. Remember the cloud is about having a consumption model that is similar to the public cloud.

Faced with the existing virtualized data centers, enterprises are torn between the options of transforming their current data centers into private clouds, outsourcing everything to the public cloud, or both. Amazon Web Services (AWS) was a pioneer in building a utility based public cloud. That pushed many CIOs to jump on the public cloud bandwagon without putting their house in order. They soon realized that outsourcing a data center with all its inefficiencies just shifts the inefficiencies to the public cloud where things can get real expensive. And while CIOs thought that they can get rid of their expensive compartmentalized IT resources, they ended up needing more IT resources to deal with the internal and external mess they have created. The early successes of the public cloud adoption for mainstream applications are with the software as a service (SaaS) model with applications such as Microsoft office 365. This is a clear cut where the application is designed for the cloud, the licensing model is well understood, and the benefits are obvious. Without a crystal clear vision of where you are and where you want to be, jumping on a public cloud without a real foothold in the private cloud is asking for trouble.

There are many roads to lead to the hybrid cloud and vendors are taking different approaches. In the early days of AWS, it was understood that an EC2 instance in the cloud meant a virtual machine that AWS creates using its own hypervisor. For private cloud vendors, the plan was to instantiate VMs on the cloud using AWS instances and figure out how to move the application between the private and public cloud environment while maintaining its security policies. Services such as data backup and disaster recovery (DR) emerged using replication techniques to move workloads and data into the cloud. Data can be stored into AWS S3 buckets or AWS elastic block storage (EBS) depending on the frequency of access. So bursting into the cloud and using all AWS native resources was about translating an enterprise environment into a cloud environment using the cloud specific application programming interfaces (APIs). Cisco Systems released solutions such as Cisco CloudCenter, where the environment of an application on-prem can be translated into a public cloud environment such as AWS, Azure, and GCP, allowing such application to be scaled-out via cloud instances. Cisco also extended its ACI offering to allow on-prem security polices to translate into native AWS security groups, insuring that application security remains consistent between the private and public clouds. For Cisco this was the only option for hybrid as Cisco HyperFlex HCI solution comes bundled with its own hardware.

In parallel, and as of August 2017, AWS started offering EC2 compute services on bare metal servers. This was a blessing for HCI vendors such as VMware and Nutanix, who can run their HCI software on 3rd party servers, as well as their own bundled hardware solutions. Since then, the hybrid cloud took another direction. VMware and Nutanix took advantage of the fact that their HCI software can now run on AWS bare metal, and the private cloud can extend into the public cloud without having to resort to a major translation between the two environments. This is a selling point to enterprises who can now leverage their existing software packages and expertise in the private cloud to deploy applications and services in the public cloud.

The first to adopt this hybrid approach is VMware with the roll-out of VMware Cloud on AWS [1]. With VMware Cloud on AWS, you are basically running VMware’s software on AWS bare metal servers. The software includes vSphere, vSAN, NSX, vCenter and any tool you normally run on-prem. Similarly, Nutanix has recently announced XI Clusters [2] which allows the Nutanix Enterprise Cloud software to run on AWS bare metal servers. Nutanix can also leverage all their software suite to offer all sorts of services on the public cloud. Where such products differentiate is first in the implementation of their HCI private cloud solution, and second, in how well they integrate with AWS natively. Since these vendors are carrying their private cloud HCI implementation into the cloud, you can rest assured that the bulk of the HCI functionality and behavior will carry into the cloud. As an example, VMware vSAN requires a minimum of three hosts so the data is replicated and protected. vSAN also requires that the storage and compute capacity is similar between the different hosts, and so on. So with VMware Cloud on AWS, you would need a minimum of three AWS bare metal servers and the compute/storage on these servers to be similar. The Nutanix HCI solution for the private cloud requires three hosts, but is more flexible on how the storage is allocated per host as their software can automatically balance the storage between the hosts. In my book, Hyperconverged Infrastructure Data Centers [3], I covered the details of HCI and differentiation between Cisco HyperFlex, VMware vSAN and Nutanix Enterprise cloud.

The second point of differentiation is in how well the solution integrates natively with the cloud provider. With VMware Cloud on AWS, VMware decided to have an overlay network on top of the AWS cloud. VMware would practically run VxLAN as it would in a private cloud. This means that the IP addressing of the VMware cloud is totally separate from the AWS VPC IP addressing. As a result you would need management and compute gateways to run as controllers to translate between the VMware vSAN environment and the AWS environment. So if an application moves from the AWS bare metal server to a native AWS VM instance inside the AWS public cloud, it cannot keep its IP address. On the other hand, Nutanix decided to integrate the networking natively with AWS. So when you initiate a Nutanix cluster on the AWS bare metal servers, the IP addressing is taken from the AWS VPC. As such applications can talk to each other natively using the AWS infrastructure, and can easily move to an AWS native instance without resorting to gateways and routers.

VMware also decided to tap into AWS elastic block storage (EBS) by allowing the bare metal hosts inside the VMware public cloud environment to attach to EBS. This is called elastic vSAN. VMware states that this solves the problem of applications that are storage heavy, as the solution is less expensive than adding hosts with physically attached storage. This is a total shift from the basics of HCI where the storage such HDDs, SSDs and NVMe are physically attached to the host. With HCI, the VMs access the physically attached storage via a direct path, bypassing the hypervisor. So VMware is making AWS EBS, which is “network attached storage”, look like it is directly attached so it can be part of the vSAN pool. The advantage is that a bigger storage pool can be built with less hosts, however I/O performance in IOPS will take a huge hit as now you are going through layers of software to reach the storage on the AWS network. With elastic vSAN, VMware is trying to solve multiple problems that exist in its current vSAN implementation. First, VMware vSAN recommends that all hosts to have similar compute and storage capacity, so vSAN will have issues with hosts that are storage heavy and others with less storage. AWS has recently announced more flavors of bare metal servers [4]. However even if AWS eventually offers storage dense servers, vSAN would still not be able to accommodate a mix of servers that are not closely similar in compute/storage. Also VMware would like to minimize the number of hosts as much as possible. For every host, a hypervisor ESXi license is bundled so things can get a little expensive. Even though the cloud solution is priced on a subscription basis per hour, or per year, or per 3 years, the hit from the bundled ESXi licenses will add cost. The other problem that VMware has with vSAN is that if one host fails and a new host is rebuilt, rebuilding the data is done on a one to one basis. One of the existing hosts will take a performance hit trying to rebuild another host. With EBS storage, the compute is decoupled from the storage. So when a host fails, the EBS volumes are detached and automatically attached to the other host without having to rebuild the data. Basically you have to compromise on having good performance vs. having more storage, and customers will have to evaluate on a case by case basis.

Nutanix differentiates in this area as well. Nutanix HCI supports storage only nodes, which means that Nutanix can accommodate a mix of servers with different flavors of compute and storage. Nutanix can stick with the traditional HCI model with physical attached storage. Nutanix can leverage AWS bare metal servers that are storage dense, whenever AWS adds such type of nodes to its list of bare metal servers. I don’t want to speak on behalf of Nutanix regarding supporting EBS in the future or not as there could be some use cases for it. The point is that there is enough flexibility in the Nutanix architecture to support all models. Also when it comes to licensing, Nutanix does not charge for the AHV hypervisor, what Nutanix calls a No-VTAX [5]. This will minimize the cost hit of adding more servers even on a subscription basis. Regarding the third issue of node failure, Nutanix uses a many to many approach to rebuilding the data in the background. The impact of rebuilding a failed server is minimized because all nodes participate in the exercise. There are many other differences, but I just wanted to give you a sample. Basically the more you know about a vendors’ HCI implementation, the more you will know about the choices they make in their cloud implementation.

So where do we go from here? The space is getting more complex every day. This blog just scratches the surface with solutions from VMware Cloud and XI clusters. AWS is entering the market with Amazon Outposts [6] where they are willing to deploy bare metal servers on-prem with full support. You can run native EC2 cloud instances on-prem, or use the server as bare metal to deploy any software you like. That’s another blog and ball of wax. Also with the emergence of Kubernetes as practically the de facto solution for hybrid and multi-cloud, welcome to container world that is taking the enterprise by a storm. You now hear announcements galore from Cisco, VMware, Nutanix, AWS, Google, Microsoft, IBM and so on. All announcements promote cooperation (coopetition really) on Kubernetes for the hybrid cloud. And while Google keeps pushing containers and Kubernetes, AWS is muddying the water with Firecracker and microVMs [7], which are light weight VMs on KVM hypervisor. These micoVMs have the light weight nature of containers but still offer segmentation at the hardware level. To conclude, this space is getting really interesting (messy) and we are now in the eye of the storm. Stay tuned !

[1] https://ir.vmware.com/overview/press-releases/press-release-details/2017/VMware-and-AWS-Announce-Initial-Availability-of-VMware-Cloud-on-AWS/default.aspx

[2] https://www.nutanix.com/blog/xi-clusters-the-rise-of-the-true-hybrid

[3] https://www.amazon.com/Hyperconverged-Infrastructure-Data-Centers-Demystifying-dp-1587145103/dp/1587145103/ref=mt_paperback?_encoding=UTF8&me=&qid=#customerReviews

[4] https://aws.amazon.com/about-aws/whats-new/2019/02/introducing-five-new-amazon-ec2-bare-metal-instances/

[5] https://www.nutanix.com/blog/stop-bully-vmware

[6] https://www.businesswire.com/news/home/20181128005680/en/Amazon-Web-Services-Announces-AWS-Outposts

[7] https://aws.amazon.com/blogs/aws/firecracker-lightweight-virtualization-for-serverless-computing/