LT Codes were originally designed for a broadcaster-subscriber model, in which the broadcaster (say, the NFL streaming a live game, or Steam serving a video game demo to users) has a full copy of the file/stream, and generates new LT chunks on the fly in an “endless stream” of chunks, and the subscribers can tune in to the stream or begin downloading the file at any time.

If you think about it, this would be impossible using a system like BitTorrent since each chunk only contains a specific piece of the file; thus, all the subscribers would need to begin downloading at the same time in order for everyone to get the full file. One workaround might be for the broadcaster to have several streams going at once at different starting times, so that a subscriber could just wait for the next stream to start their download. But this obviously uses a lot more resources and is much less flexible and convenient for the subscriber.

While exploring possible ways to store the art image files for Animecoin, I realized that it would be fairly straightforward to adapt this model to a more generic file transfer/retrieval context, where several users on the network have varying numbers of chunks downloaded and a few users have the entire file so that they can generate fresh chunks. To illustrate how this might work, we can take another Animecoin art image as an example. The file shown below is a 20.1 megabyte PNG image file:

An example image file that we are going to encode into LT chunks.

Along with the image file, we will also encode the html metadata ticket, which looks similar to this example. We combine the image file (there can be more than one image file in a given artwork, but for this example we will restrict ourselves to a single image) and the metadata ticket file into what is called a TAR file, and then compress the TAR file using a new compression library developed at Facebook over the past few years, Z-Standard. Since the PNG image file is already compressed, we do this step primarily to get a single file to work with that doesn’t contain any redundant data. The code for these preliminary steps is fairly simple and straightforward, and can be seen here:

Getting the files ready for encoding in to LT chunks.

Now, our goal is to transform this single compressed file into a collection of LT chunks. Then we will show that, even if we randomly delete or corrupt the majority of these LT chunk files, we can nevertheless reliably reconstruct the original files. If you would like to try out the Python source code yourself, you can find a self-contained demo here that is under 400 lines of code.

You can replace the image and metadata files that I used with any set of files and the demo should work in the same way. Anyway, once we have completed this initial step of creating a single compressed file from the data we want to store, we must must then specify 2 parameters that control how the data is encoded into LT chunks.

The first of these is the size of each chunk file; for this, we want to strike the right balance between the inconvenience of dealing with lots of small files, and the desire to be efficient in terms of not wasting bandwidth. That is, if the chunks are too large relative to the size of the data we are encoding (in this case, around 20 megabytes of data), we could easily end up needing 30+ megabytes worth of chunks to reconstruct the file. Since most digital image files that would be registered on the Animecoin network are probably on the order of 10 to 30 megabytes, I am using a 2 megabyte chunk size.

The other key parameter is the desired redundancy factor. What this controls is the total size of all the chunk files that we initially create. So if the original data in this case is ~20 megabytes, and we use a desired redundancy factor of 12, we will create 20 x 12 = 240 megabytes worth of chunks; at a 2 megabyte chunk size, this means that we will start out with around 120 chunk files. The code excerpt below shows the various parameters, including the ones that specify the location of the files we want to encode and where to store the resulting chunk files:

The various parameters we must specify when encoding the files into LT chunks.

The fact that we have many more chunks than are required to reconstruct the files is the first layer of protection for data stored in the Animecoin network — it is what allows us to lose a large portion of these chunks and still be able to recover the files. The second layer of protection, which is equally important, is that these chunk files will be uploaded to the network of Animecoin Masternodes so that each distinct chunk file will be stored by at least 10 randomly selected Masternodes. This two-tier data protection scheme is what gives the network its robustness and protects the art files from “Black Swan” type events, where huge portions of the infrastructure could vanish without warning. Such a risk is not just theoretical: for example, suppose that at some point in the future, the the price of Animecoin drops significantly, leading Animecoin Masternodes to liquidate their holdings and shut down their servers en masse. We need to be prepared for this situation.

The final step to truly protect the art files is to make the system dynamic. That is, even if a newly registered artwork begins with the desired 12x redundancy factor, and each chunk is spread across several independently controlled Masternodes, this redundancy might erode away over time as machines shut down or chunk files are deleted or corrupted. In order to combat this, Masternodes will be randomly selected in the Animecoin system to maintain the redundancy level of each registered artwork to ensure that it meets the minimum desired redundancy level.

The way this works is that the Masternode chosen to verify a given artwork will check the file-servers for all of the active Masternodes (these are hosted using the high-performance NGINX web-server, where access is limited to the IP addresses of valid Masternodes), looking for LT chunks that correspond to the file hash of the artwork the Masternode is checking. The Masternode then computes the overall redundancy level by adding up the size in megabytes of the distinct LT chunks it is able to locate (after verifying that the chunks are valid ) and then compares the aggregate size of these chunks to the original size of the encoded files.

If the resulting empirically verified redundancy level is below the target redundancy level, the chosen Masternode will reconstruct the original file and then proceed to generate fresh LT chunks, advertising the new chunks to the other Masternodes so they will be mirrored on multiple machines. In this way, the storage network is dynamic and self-healing: it can take a beating without dying, and will gradually recover to its initial strength, at which point it can handle another beating without ever losing the precious art file data.

How will we know that we have accurately reconstructed the original files and not somehow introduced subtle errors? At the time of encoding the original files, when we create the single compressed archive containing the image/metadata files, we will compute the SHA3–256 hash of the compressed archive. This file hash, along with the file hash of each chunk file we generate, will be encoded in the “header” of each chunk file. This means that we will always be able to detect if any chunks are corrupted, and we will know for sure at the end that we have the original files because the hash of the reconstructed file will match the file hash for that art asset that is recorded in the Animecoin blockchain.