The Jortage Storage Pool is an S3-compatible frontend to Wasabi + BunnyCDN that performs file-level deduplication, designed for fediverse instances. File-level deduplication is invaluable in the fediverse, as media posted by any user gets eagerly copied to any Mastodon instance that receives it, and has saved our 41 members 61.92% on storage.

The juicy details

When media is uploaded to Jortage, it is received by our "ingest server", running the open-source Jortage Storage Pool Manager, or poolmgr for short. Poolmgr hashes media being uploaded to it on-the-fly with SHA-512, while simultaneously streaming it to a temporary file. Once the entire file has been uploaded, poolmgr checks if this SHA-512 hash has been uploaded before. If so, the data is discarded. If not, it is uploaded to our backing store (Wasabi, currently). In both cases, the hash is then associated with the requested path in the database in the "name map".

Ultimately, this means if the same file is uploaded 100 times, the total extra used space is on the order of handfuls of bytes rather than 100 copies of the actual file. This is invaluable for use with Mastodon, as it creates local cached copies of all media it receives. Wasabi, the most affordable S3-compatible server, has a minimum 90 day fee for stored files, so aggressively purging remote media from your instance won't save you either. As well, local storage will run out and not be cost-effective to expand if your instance is sufficiently large, even without remote media to bloat it.

Let's talk about a simplified real-world example. Someone with 1,000 followers, across 100 instances, makes a post with 4 media attachments, 2MB each. Those 100 instances are pushed the status by the original instance, and they all immediately download it from the original instance. This causes a surge of traffic, totalling 800MB. All these instances then upload this media to their storage provider, and if multiple of those instances are using Wasabi, then Wasabi themselves performs deduplication, but doesn't share the benefit. The fediverse has just grown by 804MB, and any Wasabi-using instances have to pay for that for at least 90 days, even if they delete it.

Let's say 10 of those instances use the Jortage Storage Pool, and so does the originating instance. The standard case has the same surge in traffic, but instead of 80MB being stored for those 10 instances, 0MB is stored, because Jortage already has the files. Additionally, the traffic surge is absorbed by our CDN, being designed for precisely this kind of issue. If the original instance doesn't use Jortage, then only 8MB is stored. The traffic surge is difficult to prevent due to how Mastodon's media upload system works, and the facts of how S3 works. However, if we're willing to modify how Mastodon's media upload process works...

The better case is if instances use the Jortage Rivet API; in this case, instances simply send the URL of the original file to our ingest server, which then either downloads the file on their behalf and stores it in the pool, or recognizes the file has been downloaded recently and skips the download entirely, bringing the cost of any-to-Jortage federation from N to 1. Rivet also supports a cheaper upload API, where the instance calculates the SHA-512 before making the request, and never uploads the file if it's already in the pool.

Note: Rivet is currently experimental and a Mastodon "plugin" to make use of it is in the works.