IPFS — Interplanetary File System is awesome for storing data in a decentralised fashion and it’s one of the most commonly used tools for storing data within the blockchain space. Storing data within a Blockchain is expensive and slow, storing in IPFS is free and fast. IPFS does not allow duplication of data by hashing it and hashing the same data twice will give back the same hash.

IPFS seems to work really well. You can add data and it stays in this global network of nodes. Anyone with a hash can access the data (note that all data on IPFS is public unless you use a private IPFS node) and the same data is not duplicated. The question that you might be asking at this point is, “Can I change my database with IPFS and become fully decentralised?”

While IPFS is great at storing your data, it does not give any guarantees of the data being available if your IPFS node goes down. Confused? Imagine this. You start your IPFS node, run it for a month and have very important data within it. Your machine breaks, you’re thankful it's a public network so other people will have your data, right? Not quite, IPFS never guaranteed your data will always be available. No other node was interested in it and therefore no one else pinned your data except your loyal node. Consequently your data is lost since no other node has it.

This is where Filecoin comes in, Filecoin allows other nodes to be incentivized for storing your data in case your node goes down and a small fee on your end to avoid losing your private data. Filecoin is awesome, let's use it! Well, it's not there yet… Filecoin is under development and will be available at some point in the future. For now, we have to work with current solutions, which include:

Running multiple IPFS nodes Backing up ~/.ipfs folder Running IPFS-cluster Using something else (ie Storj)

There are more solutions than this but the one I want to talk about is the Ipfs-cluster. The Ipfs-cluster lets you run many nodes which each replicates the data and keeps the nodes synchronised by using RAFT consensus. Lets look at how many nodes we need to keep the system secure. A general case is that a node may go down or lose internet connection. Assuming we have t nodes, we need at least t+1 to keep the system running, in simple terms; it just means there should be at least 1 node up and running even if all others are down. So if you think 5 nodes can never be down at once then your risk factor t is 4.

IPFS cluster uses IPFS nodes to store data and its built on top of standard IPFS which means using the cluster does not mean running a different IPFS node but the exact same node you will run if you were using plain IPFS. IPFS cluster has a proxy node which is used to store all pin requests and implements RAFT algorithm (Leader based approach).

The idea is that you would interact with the IPFS cluster proxy node which handles synchronisation and backups so you can treat it exactly as an IPFS node. IPFS-Cluster nodes exchange messages between each other such as a request to pin a hash or sending out heartbeat messages to know if a node is down. One of the biggest advantages of this approach is that once a node comes back up, it can receive all the messages it has missed and be up to date in no time.