Erasure coding (EC) is a protection mechanism that offers the benefit of RAID protection, but with much more space savings. EC balances the benefit of having data protection while minimizing the overhead of doubling or tripling the size of the data caused by two-way or three-way replication.Similar to what is done in RAID 5 and 6, EC uses the concept of parity bits to rebuild lost data upon data corruption. A corrupted piece of data is reconstructed by sampling other parts of the data that exist on other disks or nodes.

EC uses mathematical oversampling to reconstruct original data by using additional data. If you have “k” data stripes to be protected, EC creates “m” additional parity stripes to reconstruct the original data. So, the notation EC k/m means that “m” parity stripes protect “k” data stripes. If, for example, you have a total of four nodes and want to protect against a one-node failure (RAID 5), use EC 3/1 so one parity stripe is used for three data stripes. If you have six nodes and want to protect against a two-node failure(RAID 6),use EC 4/2.

The benefits of erasure coding in protecting data while optimizing storage become significant as storage increases and becomes geographically dispersed. Assume that you want to protect 10 TB worth of data that is spread over five nodes, where each node has 2 TB. You normally decide whether to protect against one node failure, two node failures, and so on. If you are protecting against a single node failure, with traditional data replication, each piece of data must exist somewhere else in the system to be recovered. So, with one-time replication, the of storage has 5 TB of usable protected storage, and 50% overhead is lost on replication. If you use EC 3/1 as an example, the overhead is 1/3 = 33% instead of 50%, and the 10 TB storage has usable 7.7 TB.In general, with EC k/m,the overhead is (m/k) %, which is always smaller than the overhead of two-way or three-way replication.

Erasure coding provides great storage efficiency, but at the expense of added computation for parity. The more the parity stripes, the more computation is done to calculate the parity and to restore the original data. As a standard practice, the number of parities is either m = 1 protecting against one node failure (RAID 5) or m=2 protecting against two node failures (RAID 6) to parallel the replication factors normally used in HCI implementations. HCI implementations must weigh the use of EC against performance degradation of the system.







Sam Halabi Follow me @virtuservice

Hyperconverged Infrastructure Data Centers – Sam Halabi 2019





