I was looking into some behavior recently to assist one of our partners. He described a situation that they observed during proof-of-concept testing. I thought it would be of benefit to highlight this behavior in case you also observe it, and you are curious as to why it is happening. Let’s begin with a description of the test. The customer has a 7-node vSAN, and has implemented RAID-6 erasure coding for all VMs across the board. The customer isolated one host, and as expected, the VMs continued to run without issue. The customer was also able to clones virtual machine on vSAN and take snapshots on vSAN. No problems there. Next the customer introduced another issue by isolating another host. This now meant that there were only 5 ESXi hosts running in the cluster. Again, as expected, this did not impact the VMs. They continued to run fine, and remain accessible (RAID-6 erasure coding allows VMs to tolerate 2 failures on vSAN). However the customer next went ahead and tried to do some snapshots and clones. He encountered the following error on trying to do so:

"an error occurred while taking a snapshot: out of resources" "There are currently 5 usable fault domains. The operation requires 1 more usable fault domains."

Let’s explain why this occurs:

Let’s start with snapshots as that is easy to explain. Snapshots always inherit the same policy as the parent VMDK. Since the parent VMDK in this situation has a RAID-6 configuration which requires 6 physical ESXi hosts (4 data segments + 2 parity segments), and there are now only 5 hosts remaining in the cluster, we are not able to create an object with a configuration which adheres to the policy requirement. That’s straight forward enough.

But what about clones? What if we cloned, but we selected a different policy which did not require the same number of physical hosts?

Unfortunately his will not work either. When we clone a running VM on vSAN, we snapshot the VMDKs on the source VM before cloning them to the destination VM. Once again, these snapshots inherit the same policy as the parent VMDK, and do not use the policy of the destination VMDK. For example, here is a VM with 2 VMDKs that I cloned. Both have the default vSAN default datastore policy, as well as the VM home namespace object. I’m using RVC, the Ruby vSphere Console, available on all vCenter server:

/vcsa-06/DC/computers> vsan.vm_object_info Cluster/resourcePool/vms/win-2012-2/ VM win-2012-2: Namespace directory DOM Object: be71e758-6b8e-d700-23a0-246e962f48f8 (v5, owner: esxi-dell-k.rainpole.com, proxy owner: None, policy: hostFailuresToTolerate = 1, spbmProfileId = aa6d5a82-1c88-45da-85d3-3d74b91a5bad, proportionalCapacity = [0, 100], forceProvisioning = 0, CSN = 85, spbmProfileName = Virtual SAN Default Storage Policy, SCSN = 87, cacheReservation = 0, stripeWidth = 1, spbmProfileGenerationNumber = 0) RAID_1 Component: be71e758-260c-3201-35e6-246e962f48f8 (state: ACTIVE (5), host: esxi-dell-j.rainpole.com, md: naa.500a07510f86d6ae, ssd: naa.55cd2e404c31f8f0, votes: 1, usage: 0.5 GB, proxy component: false) Component: be71e758-d9b0-3301-14b7-246e962f48f8 (state: ACTIVE (5), host: esxi-dell-k.rainpole.com, md: naa.500a07510f86d6ca, ssd: naa.55cd2e404c31e2c7, votes: 1, usage: 0.5 GB, proxy component: false) Witness: 52e3ef58-f9db-bf03-5914-246e962f48f8 (state: ACTIVE (5), host: witness-02.rainpole.com, md: mpx.vmhba1:C0:T1:L0, ssd: mpx.vmhba1:C0:T2:L0, votes: 1, usage: 0.0 GB, proxy component: false) Disk backing: [vsanDatastore (1)] be71e758-6b8e-d700-23a0-246e962f48f8/win-2012-2.vmdk DOM Object: c071e758-3862-0523-ea3c-246e962f48f8 (v5, owner: esxi-dell-k.rainpole.com, proxy owner: None, policy: hostFailuresToTolerate = 1, spbmProfileId = aa6d5a82-1c88-45da-85d3-3d74b91a5bad, proportionalCapacity = 0, forceProvisioning = 0, CSN = 86, spbmProfileName = Virtual SAN Default Storage Policy, SCSN = 85, cacheReservation = 0, stripeWidth = 1, spbmProfileGenerationNumber = 0) RAID_1 Component: c071e758-9c2e-bd23-b93b-246e962f48f8 (state: ACTIVE (5), host: esxi-dell-j.rainpole.com, md: naa.500a07510f86d684, ssd: naa.55cd2e404c31f8f0, votes: 1, usage: 8.6 GB, proxy component: false) Component: c2d31259-be67-d9aa-9520-246e962c23f0 (state: ACTIVE (5), host: esxi-dell-l.rainpole.com, md: naa.500a07510f86d6cf, ssd: naa.55cd2e404c31f9a9, votes: 1, usage: 8.6 GB, proxy component: false) Witness: 6fd41259-7ab9-59e0-ddf8-246e962c23f0 (state: ACTIVE (5), host: witness-02.rainpole.com, md: mpx.vmhba1:C0:T1:L0, ssd: mpx.vmhba1:C0:T2:L0, votes: 1, usage: 0.0 GB, proxy component: false) Disk backing: [vsanDatastore (1)] be71e758-6b8e-d700-23a0-246e962f48f8/win-2012-2_1.vmdk DOM Object: 2072e758-4dd9-ebef-1221-246e962f48f8 (v5, owner: esxi-dell-k.rainpole.com, proxy owner: None, policy: hostFailuresToTolerate = 1, spbmProfileId = aa6d5a82-1c88-45da-85d3-3d74b91a5bad, proportionalCapacity = 0, forceProvisioning = 0, CSN = 80, spbmProfileName = Virtual SAN Default Storage Policy, SCSN = 84, cacheReservation = 0, stripeWidth = 1, spbmProfileGenerationNumber = 0) RAID_1 Component: 2072e758-ce95-11f1-6ab2-246e962f48f8 (state: ACTIVE (5), host: esxi-dell-l.rainpole.com, md: naa.500a07510f86d695, ssd: naa.55cd2e404c31f9a9, votes: 1, usage: 40.3 GB, proxy component: false) Component: 2072e758-d8e5-12f1-8d83-246e962f48f8 (state: ACTIVE (5), host: esxi-dell-i.rainpole.com, md: naa.500a07510f86d6ab, ssd: naa.55cd2e404c31ef8d, votes: 1, usage: 40.3 GB, proxy component: false) Witness: 55e3ef58-41dc-6e77-c6d6-246e962f4ab0 (state: ACTIVE (5), host: witness-02.rainpole.com, md: mpx.vmhba1:C0:T1:L0, ssd: mpx.vmhba1:C0:T2:L0, votes: 1, usage: 0.0 GB, proxy component: false)

Let’s now see what happens to these objects when I clone the VM while it is running: