In a recent post I compared the actual Usable Capacity between Nutanix ADSF vs VMware vSAN on the same hardware and we saw approx 30-40% more usable capacity delivered by Nutanix.

The next logical step is to compare data reduction/efficiency technologies which apply further storage capacity efficiencies to the usable capacity.

In the previous post I declared “a wash” between the two products for the sake of simplifying the usable capacity comparison. However the reality is it’s not a wash as the Nutanix platform has the ability to deliver more usable capacity from the same HW. Combine that with a superior storage layer in the form of the Acropolis Distributed Storage Fabric (ADSF) which compliments data reduction technologies, vSAN is bringing a spoon to a gun fight.

In this post we’ll look at Deduplication and Compression.

Data reduction technologies such as Compression, Deduplication are proven technologies which provide varying levels of value to customers depending on their datasets.

During these comparisons, marketing material, in many cases “Tick-Box” style slides are being used by vendors sales reps and worse still they are commonly taken on face value which can lead to incorrect assumptions for critical architectural/sizing considerations such as capacity, resiliency and performance.

It reminds me of a post I wrote way back in 2014 titled: Not all VAAI-NAS storage solutions are created equal where I highlighted that while multiple vendors supported VAAI-NAS for vSphere, not all vendors supported all the primitives (features) of VAAI-NAS which provided valuable functional, capacity efficiency and performance improvements by avoiding unnecessary IO operations.

Now some 6 years later, not much has changed as these tick-box marketing comparisons still plague the industry.

Let me give you a simple example:

Feature Nutanix VMware vSAN Deduplication ✅ ✅ Compression ✅ ✅

The above table shows the data reduction technologies and that both Deduplication and Compression are supported by Nutanix and VMware vSAN.

With this information, customer/parters/VARs can and do conclude the data reduction capabilities for both products are the same or at least the difference is insignificant for a purchasing or architectural decision.

When this happens, the same people are very mistaken for many critical reasons.

The following table shows what configurations data reduction configurations currently supported for both products:

Feature Nutanix

VMware vSAN



All Flash Hybrid All Flash Hybrid Deduplication ✅ ✅ ✅ ❌ Compression ✅ ✅ ✅ ❌

Here we learn that vSAN does not support data reduction technologies on Hybrid configurations (Flash + HDD). This gives customers choosing hybrid platforms an advantage going with Nutanix as they will more than likely be able to store significantly more data on the same or even less hardware.

The messaging from VMware has consistently been that they do not support data reduction on hybrid intentionally because data reduction on hybrid (HDD) tiers should not be done for performance reasons.

I’ll address this claim later in the post.

Let’s continue to dive into this and see what other differences the products have.

General Deduplication & Compression concepts

Data reduction technologies have advantages and disadvantages, for example deduplication may provide >10:1 efficiency on VDI datasets but minimal or no savings for other datasets.

For data which is already compressed (e.g.: At the application layer), storage layer compression may also not provide much value.

Both technologies also require some CPU/RAM from a storage controller (Physical, virtual, In-kernel, Controller VM) to perform these functions, this “cost” on resources and potentially performances needs to be weighed up against the capacity savings and this can vary substantially between customers.

As such, to avoid unnecessary overheads on the hosts (vSAN Kernel / Nutanix CVM), it’s important to consider when/where to use these technologies.

A “brute force” application of Compression & Deduplication will rarely result in the outcome people are sold and/or typically expect. In contrast the outcome will likely significantly impact performance especially where the data reduction ratio is low (e.g.: <1.2:1), so flexibility is key.

The following table shows how data reduction technologies can be configured:

Data Reduction Configuration Nutanix VMware vSAN Compression Only ✅ ❌ Deduplication Only ✅ ❌ Compression & Deduplication ✅ ✅

With vSAN, Compression and Deduplication must be enabled all together and for the entire vSAN cluster. This means customers lose the flexibility to enable the most suitable technology for their dataset AND the overheads of unnecessarily applying compression or deduplication.

The below shows the Deduplication and Compression configuration to enable on vSAN:

This is a major issue and will lead to inefficiency and may force customers into creating multiple silos, further reducing efficiency and increasing overheads for resiliency (e.g.: N+1 node per cluster).

e.g.: A customer has a 16 node cluster, half the workload gets 2:1 compression and deduplication and performs well, the other half are business critical applications which are suffering performance issues. To resolve this problem, the vSAN customer needs to either disable Deduplication and compression for the entire cluster and lose the 2:1 capacity savings (which likely leads to additional node purchases) OR they have to split the cluster into two or more clusters to enable them to continue using deduplication and compression where it’s delivering value.

Question: How do you split a 16 node vSAN cluster into 2 without significant downtime and/or additional hardware? (Not to mention the effort involved in planning the process).

With Nutanix, you simply toggle OFF compression or deduplication or both for the workloads which are not performing to the required level. Thus eliminating the need for a complex, risky and time consuming project splitting up a vSAN cluster due to data reduction technology constraints.

Deduplication boundary:

How and where deduplication is applied can have a significant impact in the efficiency it can deliver. Let’s compare the deduplication “boundary” for both products.

Deduplication Boundary Nutanix VMware vSAN Global ✅ ❌ Per Node/Host N/A ❌ Per Disk Group N/A ✅

For Nutanix it’s simple, any data within the storage container can be deduplicated with any other data anywhere in the cluster. For a 16 node cluster with 16 copies of the same data, if you enable deduplication, Nutanix will reduce this to 1 copy (protected with RF2 or RF3) resulting in a 16:1 efficiency for that data.

For the same example, vSAN would not achieve any efficiency as it only deduplicates within a disk group. VMware recommends at least two disk groups per node for optimal performance, which means their implementation of deduplication will allow multiple copies of data even within the same host (where the duplicates are across disk groups).

In the same 16 node cluster example where Nutanix achieved 16:1 efficiency, vSAN would store 16x the data of Nutanix due to the constraint of deduplication being only performed on a per disk group basis.

If the vSAN environment used two disk groups per node, and duplicate data was on each disk group you would potentially have 32 copies (16 nodes * 2 disk groups) of the same data due to the boundary for deduplication being at a disk group layer (as opposed to global).

In the real world, the result could be 16-32x, but is more likely to be much lower, say 1.5 or 2:0 for mixed workloads. The point is, the potential efficiency of vSAN is artificially reduced by the underlying architecture and Nutanix does not suffer from this problem.

The counter argument from VMware is global deduplication requires global metadata which may have a higher resource requirement (CPU/RAM) than the much simpler, less efficient, vSAN implementation. Well VMware are not wrong with this statement, but as Nutanix ADSF is a true distributed architecture, it already uses global metadata for the storage fabric which is why deduplication was supported way back in 2013 when vSAN only “bolted on” deduplication and compression in 2016 with vSAN 6.2.

Another argument against Nutanix global dedupe could be the loss of some data locality, but I can’t recall ever hearing this argument because it would mean VMware conceding that Nutanix’ unique implementation of data locality is valuable which they smartly steer away from as one of vSAN’s major architectural flaws is it’s inability to intelligently service writes where the VM resides after the VM moves from the host it was created on.

As we’ve learned from my Usable Capacity Comparison, The Distributed storage fabric (using it’s global metadata) also provides significantly more usable capacity vs RAW when compared to vSAN, which would justify any perceived or real additional resource overhead. Global metadata also allows for features like Storage only nodes to add capacity, resiliency and performance without any user intervention, so global metadata comes at a cost but delivers value in spades.

Next up we’ll look deeper at how, when and where the data reduction takes place:

The following table shows the storage tiers the data reduction technologies are supported for both products:

Data Reduction Tiers Nutanix VMware vSAN Write Buffer / Cache ✅ ❌ Capacity Tier ✅ ✅

vSAN data reduction technologies are NOT applied in the high performance “cache” tier whereas they are with Nutanix where in-line compression for writes is on by default and Deduplication is supported.

With vSAN, Deuplication and Compression are only applied once the data is cold and de-staged to the capacity tier. Going back to the reference I made earlier that VMware has consistently been saying that they do not support data reduction on hybrid intentionally for performance reasons. It begs the question then why they do not apply these valuable technologies to their “all flash” cache tier especially considering the write cache is limited to just 800GB per disk group.

The Nutanix write buffer (oplog) has compression enabled by default under the covers and cannot be disabled via the PRISM GUI. In testing it’s been shown to be so valuable with minimal overheads that it was somewhat “hard coded” on.

Check out: What are the performance impacts & overheads of Inline Compression on Nutanix? for an example.

Let’s say the Nutanix flash tier was the same size as vSANs maximum write cache, 800GB. With compression enabled, even assuming a conservative 1.5:1 efficiency ratio the effective flash tier is increased to 1.2TB means significantly more data is being served from the fastest possible tier regardless of if it’s SSD+HDD (Hybrid) or NVMe+SSD.

Now to address VMware’s claim around not enabling data efficiency on Hybrid platforms:

Let’s say Nutanix compression adds a massive 50% latency penalty (which it doesn’t as per the earlier reference, but hear me out), and the average combined read/write latency without compression is 2ms.

Add 50% to that and we get 3ms average latency but we now have 50% more data (assuming a conservative 1.5:1 compression ratio) in the faster tier (e.g.: NVMe or SSD) as opposed to the same 50% of data being serviced by SATA (in the case of Hybrid) where latency would be more like 10ms on average.

This is a simple example shows that data reduction on hybrid systems can and frequently is a major performance ADVANTAGE!

With that said, customers often choose hybrid for use cases which host lots of cold data, in which case it’s infrequently accessed and even if performance was impacted, a 1.5:1 or 2:1 efficiency is likely well worth a potential performance penalty.

At 2:1 that’s HALF the nodes required to store the same data! Nutanix Customers could enable compression and spend some of the saved money on additional nodes to increase performance and still end up with a NET saving and a great business outcome!

For NVMe to SATA-SSD for Flash systems with mixed drive types (NVMe & SSD), with only a 1.5:1 ratio would still enjoy 50% more data served by NVMe compared to SATA-SSD which would provide some latency and performance benefits, albeit less of an advantage compared to Flash & HDD platforms.

That is a pretty solid outcome even if we assume some impact to latency.

Important note: The vSAN cache is typically the highest cost flash device (e.g.: NVMe or Enterprise Grade SSD) so the more data you can squeeze into that tier, the better your ROI (and your performance!)

But with vSAN, data reduction is only applied to COLD data, which again goes against VMware’s claim that they don’t support data reduction on hybrid for performance reasons as they do not use data reduction for the cache tier.

vSAN applies deduplication and then compression as it moves data from the cache tier to the capacity tier. Reference: https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.virtualsan.doc/GUID-3D2D80CC-444E-454E-9B8B-25C3F620EFED.html

Nutanix applies data reduction to all data, including in the oplog (persistent write buffer) and ensures the maximum amount of hot data is stored in flash regardless of the configuration (Hybrid or all flash).

What are the options of DISABLING data reduction features?

The below table shows the options customers have to reduce or remove data reduction configuration:

Options to disable Data reduction Nutanix VMware vSAN Compression Only ✅ ❌ Deduplication Only ✅ ❌ Compression & Deduplication ✅ ✅

With vSAN, as mentioned earlier, Compression and Deduplication are an “all or nothing”. On one hand you might be getting good compression rates (say >1.5:1) but on the other hand you may experience a high impact in latency or low deduplication rates. With vSAN you need to compromise between performance and capacity efficiency due to lack of choices.

I’d say that makes VMware’s argument around overheads and efficiency of global metadata a moot point.

Side note: Before vSAN supported Compression and Deduplication, I was in a vChampions event in Sydney, Australia and I vividly recall the messaging was “It’s not required due to support for large capacity SATA drives which are low cost”. Of course we’re not stupid and we all know this was really just an excuse to buy VMware time before they could implement some for of data efficiency to compete with Nutanix and traditional storage arrays.

With Nutanix, these settings can be toggled on/off in real time and also in a granular manner to ensure the best balance between capacity efficiency and performance.

VMware also claim that global deduplication is costly on storage controller (CPU/RAM) resources which in fairness is not wrong per se, but it is misleading as all data efficiency always has a cost/benefit regardless of vendor. The key is what is our return from the “cost” and is that worth it.

Here is an article on the Cost vs Reward for the Nutanix Controller VM (CVM).

VMware rightly concede deduplication has overheads which need to be weighed up verses the benefits (which vary from customer to customer), yet their implementation is tied to being enabled with compression and worse still, activated at the cluster level. I can’t imagine that this would have ever been a design goal, more likely the best implementation available given the constraints of vSANs underlying architect.

If “global metadata” being “too costly” was a genuine reason for VMware not to do global dedupe, it begs the question why they went with per “Disk group” rather than a per node solution. No global metadata is required for per node dedupe and the concept of per node deduplication has some genuine advantages while suffering some, albeit not all of the downsides of the current vSAN implementation.

If VMware limiting vSAN dedupe to a “failure domain” was a genuine reason, it might have some merit but as the vSAN architecture is already constrained by the disk group concept. It’s more realistic to say vSAN is just limited to dedupe being per disk group as opposed to that being a genuine architectural design goal which in my opinion doesn’t make sense.

Combine the fact vSAN is not a truly distributed storage solution like Nutanix ADSF, the real reason is more likely that implementing a global metadata layer in vSAN just to support global deduplication would be way to resource intensive and would introduce significant layers of complexity in the product which is not justified for just one feature.

As Nutanix was designed from the ground up with global metadata, implementing global deduplication was the obvious choice as it was a simple extension of the existing architecture.

With Nutanix if you feel the cost/benefit of deduplication is not worth it, you can still enjoy the benefits of Compression and Erasure Coding (EC-X) and deduplication can be disabled on the fly without any downsides as we’ve already learned.

Again with vSAN you cannot turn off just deduplication so the claim deduplication (global or not) is too costly is a little silly when vSAN forces you to use two potentially costly technologies (compression and deduplication) together when in many cases you want one or the other, and not both! Noting vSAN compression is only applied for data which reduces a minimum of 2:1 ratio so customers are paying the cost for compression and only getting a return on that resource investment if the dataset is compressible at >2:1.

With that said, Deduplication is probably the most overrated feature in enterprise storage and rarely provides anywhere near the promises some vendors claim. Nutanix provides significant levels of space efficiency under the covers with technology like metadata clones (VAAI-NAS for ESXi and natively with AHV) and zero suppression in the write path and as a result, the reported deduplication savings may appear less than other vendors who misleading represent the numbers.

As such I typically recommend customers leave deduplication disabled as the savings from compression & Erasure Coding (EC-X) combined with metadata clones, the elimination of silos and zero suppression deliver excellent technical and business outcomes with the least possible resource usage.

When customers see large reported savings from deduplication, it is frequently due to misleading reporting such as reporting snapshots or metadata copies as deduplication as I discussed here: Deduplication ratios – What should be included in the reported ratio?

What is the impact of DISABLING data reduction features?

VMware often promote that their Storage Based Policy Management (SBPM) makes decisions around data efficiency and data protection (FTT vs RAID5/6) easy as you can just change the settings. While it’s true you can change the settings, the impact of doing so has significant impacts on cluster performance & resiliency which need to be carefully considered which will lead many customers to only perform these changes during maintenance windows and almost always out of business hours due to the long duration and high impact.

The following table shows 4 major impacts when disabling Compression and Deduplication on vSAN, none of which are applicable to Nutanix:

Impact Nutanix VMware vSAN (1) Full evacuation of data from disk group ❎ 🛑 (2) Change of disk format ❎ 🛑 (3) Disk group Capacity unavailable during operation ❎ 🛑 (4) Temporary Reduced Resiliency ❎ 🛑

The following quotes are from VMware’s documentation confirming the above.

vSAN Full Evacuation, Change of Disk format & Capacity unavailable (disk group removed):

While disabling deduplication and compression, vSAN changes the disk format on each disk group of the cluster. It evacuates data from the disk group, removes the disk group, and recreates it with a format that does not support deduplication and compression. The time required for this operation depends on the number of hosts in the cluster and amount of data. You can monitor the progress on the Tasks and Events tab. Reference: https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.virtualsan.doc/GUID-5A01D0C3-8E6B-44A7-9B0C-5539698774CC.html

vSAN Temporary Reduced Resiliency:

As a result, temporarily during the format change for deduplication and compression, your virtual machines might be at risk of experiencing data loss. vSAN restores full compliance and redundancy after the format conversion is completed. Reference: https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.virtualsan.doc/GUID-125B2B04-FBB9-43AB-8AF9-E7179734BC7C.html

Nutanix on the other hand allows on the fly changes without all these downsides. the process is handled as a low priority task by curator and processed over time while the new setting is immediately applied to incoming IO.

Let’s discuss the resiliency considerations with data reduction technologies.

With vSAN, using Data reduction technologies significantly increase the impact of a failures. The following resiliency scenarios highlight the advantage of Nutanix which does not suffer from any of these constraints.

Resiliency Nutanix VMware vSAN Full Resiliency during data efficiency setting changes (On/Off) ✅ ❌ A single drive (capacity/cache) failure only loses data on the one failed drive ✅ ❌ Removing or Adding drives does not impact any other drives ✅ ❌ Write Cache/buffer drive failure does not cause multiple drives to become unavailable* ✅*



*Except in single SSD hybrid nodes ❌

Key point: With Nutanix, Resiliency is NEVER compromised as a result of any data efficiency setting/change.

The below quote clarifies the failure scenarios for vSAN:

If a capacity disk fails, the entire disk group becomes unavailable. To resolve this issue, identify and replace the failing component immediately. When removing the failed disk group, use the No Data Migration option. Reference: https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.virtualsan.doc/GUID-AA72CA1D-803D-4D1D-87BB-E7D86EC947D2.html

Minimum compression ratios

vSAN only compresses data (following deduplication) if the 4kb block is compressed to <=2kb, meaning unless your data is compressible at 2:1 ratio or higher, vSAN customers get NO compression savings.

Nutanix minimum compression ratio is currently 30% meaning Nutanix customers will have a significant capacity advantage on datasets which achieve <2:1 compression but more than 1.3:1.

A fairly weak argument could be made that 30% is too low, my personal opinion is 1.3-1.5:1 is probably the sweet spot but a very strong argument can be made that vSAN’s 2:1 is robbing customers of significant data reduction savings.

Data Efficiency Advantage example calculations.

Finally let’s do some quick math to see what the advantages of Nutanix’ combined higher usable capacity and more comprehensive data reduction technologies might look like for customers:

In my previous post Usable Capacity between Nutanix ADSF vs VMware vSAN I showed a 16 node cluster example where Nutanix had a 41.25% usable capacity advantage over vSAN with 71TB usable vs 41TB for an RF2/FFT1 configuration.

Let’s use those numbers for this example:

Nutanix RF2 (FTT1) Usable 71.34 With 1.5:1 Data reduction 107.01 With 2:1 Data reduction 142.68 With 20% data reduction efficiency 171.22 vSAN FTT1 (RF2) Usable 41.91 With 1.5:1 Data reduction*

*Dedupe only. Compression is not applied <2:1 62.87 With 2:1 Data reduction 83.83 Usable TB advantage for Nutanix 1.5:1*

*Likely higher as compression is applied >1.3:1 44.14** 2.0:1 58.85** 2.0:1 + 20% architecture advantage 87.39

** Values updated due to previous miscalculation.

In the above table we see that due to the large usable capacity advantage, when data reduction even with the same ratio as vSAN applied, the effective capacity advantage grows significantly higher.

If we assume a conservative 20% advantage due to Nutanix’s superior data reduction architecture from applying data reduction globally, the persistent write buffer (equivalent to vSAN’s cache layer) AND for compression ratios <2:1, we see that the effective capacity advantage of 87.39TB (not total) increases to more than vSANs total usable capacity with data reduction.

In this example, Nutanix provides 171TB usable (RF2/FTT1) and vSAN only provides 83.83TB, around 50% less usable capacity than Nutanix.

Let’s now consider that if customers are educated on the resiliency issues with vSAN, they may chose to avoid using some/all of vSAN’s data reduction technologies to improve resiliency, in which case the advantage is even greater for Nutanix.

Summary:

We’ve covered that marketing slides or sales pitches showing both products to support the same features to be very misleading.

In this post we’ve covered a wide range of factors regarding the implementation of Data Reduction technologies and how the underlying architectures have a major impact on the real world value of these features.

If we combine the 20-40%+ usable capacity advantage Nutanix has over vSAN WITHOUT data reduction applied, then any achieved ratio (even if it’s the same ratio as vSAN achieves) is going to increase the advantage further especially as we’ve learned Nutanix applied data reduction to all tiers of storage.

Putting aside the obvious capacity advantages, Nutanix never compromises Resiliency when data reduction technologies are used while allowing these features to be enabled/disabled on a granular basis without reformatting or major back end overheads.

If the resiliency of the data is ever compromised as a direct result of using data reduction technologies, that’s not a minimally viable implementation.

When data reduction technologies cannot be applied to the fastest and most expensive tier of storage (e.g.: NVMe or Enterprise Grade SSD i.e.: vSANs cache), the customer is losing out on typically very significant performance improvement and ROI (getting more out of that expensive storage).

When enabling data reduction make impact of ANY single drive failure in a vSAN disk group (1 cache and up to 7 capacity drives) cause the entire disk group to go offline and need to be rebuilt, the risk vs reward is just not worth it.

By Nutanix investing heavily from day 1 in creating a truly distributed storage fabric, it has allowed not only more usable capacity vs RAW compared to more rudimentary products like vSAN, but also allowed Nutanix to implement data efficiency technologies to further drive efficiencies and give customers flexibility.

In my next post, I cover Erasure Coding Comparison – Nutanix ADSF vs vSAN

Related Posts: