In 1998, Michael Luby published the Luby Transform Code (LT Code), an error correction algorithm belonging to the “fountain” code class, a type of corrector code that can generate an infinite number of packets to reconstitute data lost during transfers across different networks.

This type of algorithm is particularly useful in the case of transmission without any way to ask to recover the data again, for example:

Where there is physically only one way of communication ( a network diode, in cybersecurity );

); Where the receiver does not have enough energy or is not powerful enough to request the data again ( a spatial transmission );

); Where any sequence of packets can be received in any order and distributed completely randomly between the receivers ( Peer-To-Peer transmission );

); Etc.

It is not the only type of algorithm to solve this kind of problem. The field of error correction codes is very vast, and the purpose of this article is only to propose a simple approach to the operation of LT Codes. Other fountain codes now have better performance, for example Raptor Codes / Online Codes et RaptorQ.

We will then see the basic operation of LT Codes and how to implement it with Python in an optimized way for high-speed communications with speeds from 100 MB/s to 300 MB/s. The code is available on Github.

The idea behind LT codes and how it works

Two principles are fundamental: LT Codes must generate new packets –called symbols– on the fly, and several of these symbols combined together may reconstruct the original data.

The symbol creation (or encoding) consists in randomly taking initial data blocks and combining them together. The operation used is the XOR (as ) for its ability to perform the reverse operation. For example, is a symbol such as , then:

Knowing , we find such that ;

, we find such that ; Knowing , we find such that .

Let’s take the following example, where we share our data in blocks of the same size and randomly draw 6 combinations:

So we have:

Like other fountain codes, the encoding and decoding process of LT Codes are based on graph theory and can be represented by a bipartite graph (figure 1). The degree of a symbol is defined as its number of links: has 1 degree, and have 2 degrees and has 3 degrees.

After sending the symbols in packets (where each symbol is transmitted with its associated block list), the receiver receives the packets except the packet containing the symbol , lost :

We then only have to decode by looking for the symbols that correspond directly to a block and then partially solve the other symbols that are associated with the same block.

Observing that , it is then possible to XOR and with the value of (step 1). Thanks to this, solves of which it is directly equal. We can then XOR and with (step 2). The two symbols and allow us to solve and , the last two unknown blocks (step 3).

In the end, the decoding will have been done optimally: received symbols (or packets) will have been sufficient to decode the blocks. The symbol will have been useless and the loss of the symbol will not have been penalizing. This whole process of decoding is sometimes called “belief-propagation“.

There are several conditions necessary for a good decoding:

The decoder must have at least symbols if blocks are to be decoded.

symbols if blocks are to be decoded. At least one of the symbols must be of degree 1 (i.e. have a single link), to decode a block that will itself be used to decode other symbols, and so on. This is the “entry point” of the decoder.

In practice, if the transmission is subject to a high packet loss rate then it is better to send several symbols of degree 1 to have several “entry points”. On the other hand, sending only symbols of degree 1 means sending the data without encoding and the interest of the corrector code is lost.

In another way, the higher the degree a symbol has, the more useful it will be, on its own, for the recovery of several blocks if its other blocks are resolved. For example, allows us to recover , or if one of these 3 is missing and the others are already solved.

Lastly, given the non-deterministic character of LT Codes, it is not certain that it is possible to resolve all blocks with given symbols if the random draw is unlucky. Let’s take the following example:

In this case, only the degree was changed from 2 to 1, preventing complete decoding. Increasing the degrees of the other symbols is a solution to solve the problem. The choice of the right degree to encode symbols is in fact a very important research axis of LT Codes and represents one of the most important optimization levers of the algorithm.

Another technique solves this problem by temporarily disabling links from certain blocks. For example, disabling links then can partially resolve and can partially resolve , and so on. This technique, called “inactivation decoding” requires more complex mathematics and will not be discussed here.

oPTIMIZING DEGREES: ideal soliton distribution

The goal of the game is to determine how often degree symbols will be created. Michael Luby first presents in his research a distribution called “ideal soliton distribution”, whose probability of obtaining a degree is such that :

Small degrees are favored, with emphasis on creating symbols from 2 blocks. Also note the 0 probability of having a symbol from 0 blocks (which is useless). Here is its implementation in Python:

def ideal_distribution(N): """ Create the ideal soliton distribution. In practice, this distribution gives not the best results Cf. https://en.wikipedia.org/wiki/Soliton_distribution """ probabilities = [0, 1 / N] probabilities += [1 / (k * (k - 1)) for k in range(2, N+1)] probabilities_sum = sum(probabilities) assert probabilities_sum >= 1 - EPSILON and probabilities_sum <= 1 + EPSILON, "The ideal distribution should be standardized" return probabilities

In the idea, this distribution minimizes the number of symbols to create before it is possible to start a decoding. In practice, the distribution does not give good results and many decodings remain blocked as long as there has not been a symbol created from a missing block, a possibility accentuated by the fact that small degrees are favored.

Optimizing degrees: Robust soliton distribution

Another distribution, called “robust soliton distribution” aims to mix the “small” degrees with “larger” degrees as we assumed at the end of the previous part. The idea is to add a peak probability for a degree (with ), and smooth the rest. The values are defined as:

With a parameter introducing a probability of decoding failure. Then, since the sum of the probabilities must be equal to 1, we normalize :

So the distribution favors both small degrees and high degrees. In the example above, we chose 13. Generally speaking, choosing a equivalent to about half the number of blocks seems to be a reasonable value. Here is the implementation of the Python distribution:

ROBUST_FAILURE_PROBABILITY = 0.01 def robust_distribution(N): """ Create the robust soliton distribution. This fixes the problems of the ideal distribution Cf. https://en.wikipedia.org/wiki/Soliton_distribution """ # The choice of M is not a part of the distribution ; it may be improved # We take the median and add +1 to avoid possible division by zero M = N // 2 + 1 R = N / M extra_proba = [0] + [1 / (i * M) for i in range(1, M)] extra_proba += [math.log(R / ROBUST_FAILURE_PROBABILITY) / M] # Spike at M extra_proba += [0 for k in range(M+1, N+1)] probabilities = np.add(extra_proba, ideal_distribution(N)) probabilities /= np.sum(probabilities) probabilities_sum = np.sum(probabilities) assert probabilities_sum >= 1 - EPSILON and probabilities_sum <= 1 + EPSILON, "The robust distribution should be standardized" return probabilities

This solution is effective since part of the symbols are used directly to build blocks (small degrees) and another part of the symbols is used to compensate for bad reception and impossible decoding (large degrees).

Systematic LT Codes

Some fountain codes are called “systematic” or “non-systematic” (cf. Systematic Code), and some codes may be one or the other: this is the case of Raptor Codes and also the case of LT Codes.

In code theory, a systematic code means that the data is encoded directly as it is and that the first symbols correspond exactly to the first blocks (and therefore, degrees 1). This is like sending the complete data once and then encoding the redundancy.

So far, we have encoded non-systematically. If we use our example again, then only the symbols and would come from several blocks at once:

In the case where packet loss is minimal, then systematically encoding avoids wasting time encoding and decoding symbols.

The symbols resulting from the combination of many blocks (therefore with a high degree) –as here the symbols and – offer then the possibility to recover many different blocks with only one symbol.

ENCODING

In this part, I propose to you to implement the encoding in Python with a particular attention to its speed of execution. You can find the complete code on Github.

In the hypothesis where the data to encode comes from a file, the first thing to do is to read the data and store it in an appropriate way :

core.PACKET_SIZE = 65536 ... def blocks_read(file, filesize): """ Read the given file by blocks of `core.PACKET_SIZE` and use np.frombuffer() improvement. Byt default, we store each octet into a np.uint8 array space, but it is also possible to store up to 8 octets together in a np.uint64 array space. This process is not saving memory but it helps reduce dimensionnality, especially for the XOR operation in the encoding. Example: * np.frombuffer(b'x01x02', dtype=np.uint8) => array([1, 2], dtype=uint8) * np.frombuffer(b'x01x02', dtype=np.uint16) => array([513], dtype=uint16) """ blocks_n = math.ceil(filesize / core.PACKET_SIZE) blocks = [] # Read data by blocks of size core.PACKET_SIZE for i in range(blocks_n): data = bytearray(file.read(core.PACKET_SIZE)) if not data: raise "stop" # The last read bytes needs a right padding to be XORed in the future if len(data) != core.PACKET_SIZE: data = data + bytearray(core.PACKET_SIZE - len(data)) assert i == blocks_n-1, "Packet #{} has a not handled size of {} bytes".format(i, len(blocks[i])) # Paquets are condensed in the right array type blocks.append(np.frombuffer(data, dtype=core.NUMPY_TYPE)) return blocks

Each data block read with the bytes type (immutable) is cast to the bytearray type (mutable) to then allow the XOR operation between two blocks. Since all blocks must be the same size and that the input file may not allow equal splitting, a padding to the right is added to the last block. In concrete terms, this means that if this last block is not the right size, 0’s are added at the end.

We use the np.from_buffer() function which allows to store 64 bits tables instead of 8 bits (default). This reduces the bytearray dimensionality and reduces the time required for a XOR.

A Symbol class will also be defined for convenience. For each symbol to create, the iterative encoding follows the following steps :

Randomly draw a degree deg according to the degree distribution.

according to the degree distribution. Select uniformly at random a quantity deg of distinct blocks, using as seed the index of the symbol to create.

of distinct blocks, using as seed the index of the symbol to create. Combine the blocks by XORing themselves to obtain the new symbol.

The encode() function is the following:

class Symbol: __slots__ = ["index", "degree", "data", "neighbors"] # fixing attributes may reduce memory usage def __init__(self, index, degree, data): self.index = index self.degree = degree self.data = data def get_degrees_from(distribution_name, N, k): """ Returns the random degrees from a given distribution of probabilities. The degrees distribution must look like a Poisson distribution and the degree of the first drop is 1 to ensure the start of decoding. """ if distribution_name == "ideal": probabilities = ideal_distribution(N) elif distribution_name == "robust": probabilities = robust_distribution(N) else: probabilities = None population = list(range(0, N+1)) return [1] + choices(population, probabilities, k=k-1) def encode(blocks, drops_quantity): # Display statistics blocks_n = len(blocks) assert blocks_n <= drops_quantity, "Because of the unicity in the random neighbors, it is need to drop at least the same amount of blocks" # Generate random indexes associated to random degrees, seeded with the symbol id random_degrees = get_degrees_from("robust", blocks_n, k=drops_quantity) for i in range(drops_quantity): # Get the random selection, generated precedently (for performance) selection_indexes, deg = generate_indexes(i, random_degrees[i], blocks_n) # Xor each selected array within each other gives the drop (or just take one block if there is only one selected) drop = blocks[selection_indexes[0]] for n in range(1, deg): drop = np.bitwise_xor(drop, blocks[selection_indexes[n]]) # drop = drop ^ blocks[selection_indexes[n]] # according to my tests, this has the same performance # Create symbol, then log the process symbol = Symbol(index=i, degree=deg, data=drop) yield symbol

The symbol index is used as a seed when randomly drawing blocks to allow not to send the complete list of blocks used (or neighbors in the code), but only the index so that the graph is rebuilt on the decoder side. This trick is often used in corrector codes and was proposed to me by @catid who himself used it in his fountain code implementation, Wirehair.

Creating an array outside the loop containing randomly drawn degrees improves code speed by reducing the number of calls to random.sample() . In other words, it is better to draw several random numbers once than draw several times only one random number.

My tests didn’t find any difference between the Python operator ^ and np.bitwise_xor() to XORer bytearray between them. This is, by the way, the slowest encoding operation.

The code uses yield to directly transfer packaged symbols directly to the network, without storing them in RAM memory.

Decoding

Unlike encoding, which allows symbols to be sent on the fly as soon as they are created, here the decoding function must be called after the symbols are received and then processes them all together.

The first thing to do is to reconstruct the graph using the received symbol indexes. The links (here neighbors ) are saved in Python set objects:

def recover_graph(symbols, blocks_quantity): """ Get back the same random indexes (or neighbors), thanks to the symbol id as seed. For an easy implementation purpose, we register the indexes as property of the Symbols objects. """ for symbol in symbols: neighbors, deg = generate_indexes(symbol.index, symbol.degree, blocks_quantity) symbol.neighbors = {x for x in neighbors} symbol.degree = deg return symbols

Then, we implement the iterative decoding procedure following the steps :

Look for a symbol to decode whose degree is equal to 1 If such a symbol exists, go to (2) Otherwise, the decoding failed. Since the symbol has only one degree, it has only one associated block. So the data of the block to recover is directly the data of the symbol. The block being solved, we can then look for the symbols linked to the block and delete their link after having XORed their data. Here, the function concerned is called reduce_neighbors() . If all blocks have been resolved, stop decoding. If not, repeat the process at (1).

The decoding() function implements full decoding while reduce_neighbors() propagates block resolution to symbols.

def decode(symbols, blocks_quantity): symbols_n = len(symbols) assert symbols_n > 0, "There are no symbols to decode." # We keep `blocks_n` notation and create the empty list blocks_n = blocks_quantity blocks = [None] * blocks_n # Recover the degrees and associated neighbors using the seed (the index, cf. encoding). symbols = recover_graph(symbols, blocks_n) solved_blocks_count = 0 iteration_solved_count = 0 start_time = time.time() while iteration_solved_count > 0 or solved_blocks_count == 0: iteration_solved_count = 0 # Search for solvable symbols for i, symbol in enumerate(symbols): # Check the current degree. If it's 1 then we can recover data if symbol.degree == 1: iteration_solved_count += 1 block_index = next(iter(symbol.neighbors)) symbols.pop(i) # This symbol is redundant: another already helped decoding the same block if blocks[block_index] is not None: continue blocks[block_index] = symbol.data # Update the count solved_blocks_count += 1 # Reduce the degrees of other symbols that contains the solved block as neighbor reduce_neighbors(block_index, blocks, symbols) return np.asarray(blocks), solved_blocks_count def reduce_neighbors(block_index, blocks, symbols): """ Loop over the remaining symbols to find for a common link between each symbol and the last solved block `block` To avoid increasing complexity and another for loop, the neighbors are stored as dictionnary which enable to directly delete the entry after XORing back. """ for other_symbol in symbols: if other_symbol.degree > 1 and block_index in other_symbol.neighbors: # XOR the data and remove the index from the neighbors other_symbol.data = np.bitwise_xor(blocks[block_index], other_symbol.data) other_symbol.neighbors.remove(block_index) other_symbol.degree -= 1

Using Python sets is useful for improving the performance of the reduce_neighbors() function, since it is possible to access a symbol link via its index (e.g. symbol.neighbors[block_index] ) rather than running symbol.neighbors through a loop like an array.

In algorithmic complexity, if is the degree of a symbol then this optimization allows to shift from to .

Once all the blocks have been decoded, just convert the bytearray into bytes and write them in a file:

def blocks_write(blocks, file, filesize): """ Write the given blocks into a file""" count = 0 for data in recovered_blocks[:-1]: file_copy.write(data) count += len(data) # Convert back the bytearray to bytes and shrink back last_bytes = bytes(recovered_blocks[-1]) shrinked_data = last_bytes[:filesize % core.PACKET_SIZE] file_copy.write(shrinked_data)

Benchmarks and performance

Implemented as is, the encoding and decoding of a file of less than 1MB is instantaneous. In “systematic” mode with 1.5x redundancy, the transfer speed evolves from 300 MB/s to 10 MB/s for files ranging from 100 MB to a few GB.

As an indication, here are several encoding and decoding time measurements on different files:

Size (MB) Blocks Symbols Encoding Decoding Time (s) Speed (MB/s) Time (s) Speed (MB/s) 1 16 24 0.00 – 0.00 – 100 1600 2400 0.21 476.1 0.31 322.5 1200 19200 28800 3.86 310.8 39.82 30.1 2000 32000 48000 6.44 310.5 104.10 19.2 3600 57600 86400 23.14 155.5 426.36 8.4

Not surprisingly, it’s the decoding part and its algorithmic complexity in minimum that limits performance.

Quickly, we notice that decoding becomes impracticable the more the size –and therefore the number of symbols to manage– increases. A simple solution to overcome this limitation would be to undercut a file into several chunks.

Conclusion

The purpose of this article was to approach the fountain codes with an example, the LT Codes. We have seen how encoding symbols from several data blocks could correct several errors at once.

We also have seen the limitations of iterative decoding, which becomes slower and slower as the size of the data to be encoded increases. However, other more powerful decoding algorithms have since been proposed (a simple Google search with the keyword “LT Codes” shows how active the research is).

Other correction codes also exist. Raptor Codes are for example an evolution of LT Codes which allow a linear decoding whatever the size of the file.

You can find the full implementation on Github, as well as the benchmark script and its results.

References that helped to write this article: