Off-Chain Data

Why traditional clouds don’t work for blockchain data

Imagine you’re a blockchain engineer. You’ve been tasked with building a blockchain application that trades assets on-chain. Let’s pretend the asset you are going to trade is pinatas. So, you build a pinata exchange to buy and sell those pinatas. For this example, the pinatas can be real-world pinatas or digital pinatas, it doesn’t matter. To do so, you can create a smart contract or you can tokenize the pinatas as Non-fungible tokens (NFTs) to facilitate transactions. If NFTs are new to you, be sure to learn more here or build one yourself with OpenSea’s tutorial.

Once you’ve figured out how you will transact on-chain, you now have to figure out what data needs to be associated with the pinatas on the exchange. You could use photos, video, or IoT data from sensors. You then take that data and try to store it on-chain. Uh oh! It’s way too inefficient to store that much data on-chain. Replicating data thousands of times across the globe on a blockchain network isn’t needed for all data and it’s prohibitively expensive. It would cost more to store the data on-chain than what the pinata is worth! So, what do you do? The best option, and most common practice, is to move as much data “off-chain” as possible and only reference it on-chain. Doing so could be done with something like an ID of the pinatas stored on-chain and then the rest of the data like the photos, videos, or IoT data mentioned early is stored and referenced off-chain. Problem solved.

Not quite. Unfortunately, traditional cloud options aren’t equipped to handle off-chain data in a cryptographically secure or distributed manner that is required of blockchain apps. Below, we list the qualities that we’ve discovered from blockchain engineers to be important when handling off-chain data. These requirements have been identified by talking with blockchain engineers using Pinata.Cloud and the protocol we run on, IPFS.

Trust

First and foremost, blockchain apps need to be able to trust their off-chain data just as much as their on-chain data. More specifically, apps need to be able to trust the data itself, without needing to trust who or where that data comes from. The way that this trust can be created is through data that is uniquely identifiable and has distributed accessibility.

IPFS generates a uniquely identifiable “URL” for data.

Uniquely Identifiable

First, blockchain apps need a guarantee that no two pieces of data can ever be confused with each other. IPFS handles this with its content addressability explained in the video above. Content addressable systems locate and handle data based on the data itself and not where the data is located. This provides all sorts of advantages for blockchain applications. But, it is most notable for its ability to guarantee that data hasn’t been tampered with or changed. If the underlying data changes, that data will produce a completely new ID. This lets the data’s recipient recognize false data because the content they received won’t produce the ID it’s supposed to when being validated. This quality is important for off-chain data because without uniquely identifiable data, it would defeat the advantage of an immutable blockchain itself. For example, let’s pretend that instead of using a proper off-chain data system, an app has an on-chain asset referencing something like:

myapp.com/assets/my_content_12345.json.

The problem here is that my_content_12345.json could be changed at anytime without anybody knowing. This is a massive security problem for applications that need to be able to trust the data they’re working with. With a proper off-chain data system, the on-chain asset would instead have a data reference that looks something like:

ipfs://Qma6e8dovfLyiG2UUfdkSHNPAySzrWLX9qVXb44v1muqcp

Notice how the reference is a unique identifier for the content? This means that the receiver of this data can easily validate the content they receive from the off-chain data network by comparing it against the content ID. Boom. Tamper proof data.

Distributed Accessibility

Data needs to be accessible in a distributed and P2P manner just like the blockchain the data is anchored to itself. This distributed accessibility should happen regardless of how “centralized” or “decentralized” the infrastructure underneath is. Distributed accessibility means you can pull data from a centralized cloud provider just the same as you can from a decentralized system like Filecoin. To reiterate, you should be able to access your data when you want to access your data regardless of the infrastructure underneath. If you can’t access your data how you want to access it, your data won’t have the distributed and P2P advantages of the blockchain it is anchored to.

Previously, my cofounder Matt Ober has hinted at the concept of “off-chain” data in his post, “Ethereum and IPFS”. We’ve even seen IBM dip their toes into the concept with their paper, “Why new off-chain storage is required for blockchains”. Tl;dr, they describe an IBM version of IPFS. In future posts, I will dive into how the requirements of off-chain data for blockchains today will become the standard of all clouds tomorrow.

If you are struggling with handling off-chain data, check out Pinata.Cloud and join our slack.