In a recent post I compared the actual Usable Capacity between Nutanix ADSF vs VMware vSAN on the same hardware. We saw approximately 30-40% more usable capacity delivered by Nutanix.

Next up we compared Deduplication & Compression technologies where we learned Nutanix has outright capacity efficiency, flexibility, resiliency and performance advantages over vSAN.

Now we’ll look into Erasure Coding which is another valuable and proven technology that can drive further capacity and potentially performance efficiencies, complimentary to Deduplication and Compression.

As I’ve highlighted in the previous articles, “Tick-Box” style slides commonly lead to incorrect assumptions for critical architectural/sizing considerations such as capacity, resiliency and performance.

This problem is also applicable to Erasure Coding, Let me give you a simple example:

Feature Nutanix VMware vSAN Erasure Coding / RAID5/6 ✅ ✅

The above table shows Erasure Coding / RAID 5&6 are supported by Nutanix and VMware vSAN.

With this information, customer/partners/VARs could conclude the data reduction capabilities for both products are the same, or at least the difference is insignificant for a purchasing or architectural decision, right?

If they did, they would be very mistaken for many critical reasons.

The following table shows what configurations data reduction configurations currently supported for both products:

Feature Nutanix

VMware vSAN



All Flash Hybrid All Flash Hybrid Erasure Coding / RAID5&6 ✅ ✅ ✅ ❌

As we’ve previously learned, vSAN does not support Deduplication or Compression on Hybrid configurations (Flash + HDD) and the same is true for Erasure Coding.

This gives customers choosing hybrid platforms an advantage going with Nutanix as they will more than likely be able to store significantly more data on the same, or even less hardware.

As I discussed in the previous Deduplication & Compression comparison, the messaging from VMware has consistently been that vSAN does not support data reduction on hybrid intentionally because data reduction on hybrid (HDD) tiers should not be done for performance reasons. This messaging lacks credibility, which I’ll explain later in this article.

Next up, we’ll look deeper at how, when and where the potential capacity savings from Erasure Coding take place.

Let’s start with what data each implementation of Erasure Coding apples to:

Erasure coding Nutanix VMware vSAN Write hot ❎ ⚠️ Write cold ✅ ❌ Read Hot ✅ ✅ Read cold ✅ ✅

The first type of data is by far the most important when considering Erasure coding because the process of striping the data and subsequent overwrites is expensive both computationally and on back end IO which is referred to as “Write amplification“.

Applying Erasure Coding to “Write Hot” data means all incoming write IO suffers a front-end performance penalty. i.e.: The performance as seen by the applications & VM is impacted.

For vSAN, RAID 5 or 6 is applied via SPBM but it’s either ON or OFF which forces customers to make a choice between performance and capacity efficiency.

With Nutanix, when enabled the Erasure coding implementation dynamically determines what data is suitable for EC-X based on if the data is write cold or not at a 4MB (extent group) granularity.

I wrote an article some time ago titled What I/O will Nutanix Erasure coding (EC-X) take effect on? where I show the following diagram:

While this diagram shows a hybrid configuration, the same process is followed for All Flash. The only difference being the last step (at the base of the diagram) is not applicable.

This means EC-X can be enabled on all data (literally) and if the data is “write hot” or frequently overwritten, EC-X won’t be applied. For data that is “write cold” (i.e.: Not being written to, but may be actively being read), EC-X will be applied.

With Nutanix, customers get the best of both worlds, with no impact to front end write performance AND EC-X is only applied to suitable (write cold) data — all dynamically managed by the Acropolis distributed storage fabric (ADSF).

For example: If a SQL server has 10TB of data, and 1TB is actively being written to (or new IO) and 9TB is actively being read but not overwritten (changed), EC-X is applied to 90% (9TB) of the VMs’ data for maximum capacity efficiency while the 1TB of write hot data which is not suitable for Erasure Coding will get optimal performance with RF2 or RF3.

An argument I’ve heard from VMware is applying Erasure Coding (RAID5/6) in-line prevents the need for potentially significant levels of free capacity for large data ingest. While this has some validity, vSAN has a firm requirement for 25-30% slack space regardless of cluster size, so regardless of if vSAN does Erasure Coding inline or not, that argument has no (or at best an insignificant) advantage for vSAN customers.

For Nutanix, in a largely extreme situation where a multi-PB amount of data needs to be ingested AND the cluster/s has not been designed with sufficient headroom, resiliency etc, EC-X can be applied more aggressively (simply put a nearline manner) in addition to compression and/or deduplication to address these needs.

Another uncompromisingly simple and effective solution for Nutanix customers.

Now to address VMware’s claim around not enabling data efficiency on Hybrid platforms:

Let’s say Nutanix Erasure Coding (EC-X) adds a massive 50% latency penalty (which it doesn’t as we’ve learned its dynamically applied only where suitable and never impact front end incoming IO), and the average combined read/write latency without EC-X is 2ms.

Add 50% to that and we get 3ms average latency but we now have 50% more data (assuming a conservative 1.5:1 EC-X ratio) in the faster tier (e.g.: NVMe or SSD) as opposed to the same 50% of data being serviced by SATA (in the case of Hybrid) where latency would be more like 10ms on average.

This simple example shows that Erasure Coding on hybrid systems (as well as All Flash) can and frequently is a major performance ADVANTAGE!

With that said, customers often choose hybrid for use cases which host lots of cold data, in which case it’s infrequently accessed and even if performance is impacted, a 1.5:1 or 2:1 efficiency is likely well worth a potential performance penalty.

At 2:1 that’s HALF the nodes required to store the same data! Customers could enable Erasure Coding and spend some of the saved money on additional nodes to increase performance and still end up with a NET saving and a great business outcome!

For NVMe to SATA-SSD for Flash systems with mixed drive types (NVMe & SSD), a conservative 1.5:1 ratio would still enjoy 50% more data served by NVMe compared to SATA-SSD which would provide some latency and performance benefits, albeit less of an advantage compared to Flash & HDD platforms.

That is a pretty solid outcome even if we assume some impact to latency.

Summary:

We’ve covered that marketing slides or sales pitches showing both products to support the same feature to be very misleading as the implementations are worlds apart in terms of maturity and real world value.

Nutanix EC-X delivers maximum capacity efficiency to suitable data in a dynamic manner while not impacting the front end IO.

This is literally the best of both worlds, capacity efficiency plus performance.

vSAN customers are forced to choose a brute force ON or OFF which will directly and negatively impact front end IO (performance) for all write IO even where RAID5/6 is not a good fit for the dataset.

Add to that Nutanix delivers Erasure Coding for Hybrid environments where workloads are often extremely complimentary to EC-X as they are commonly used for long term archive, secondary storage (snapshots) and datasets with lower IO requirements, and it’s clear that Nutanix is the leader in both performance and capacity efficiency.

Next up, let’s review how Nutanix & vSAN can scale storage capacity!

Related Posts: