Author: Zahoor Mohamed @jmohamedzahoor

Photo by Lisa Woakes on Unsplash

Note: The official documentation for pinning can be found in the Swarm documentation here. This is a more detailed developer-friendly version of the same information.

Watch the tutorial with demo here.

Distributed or decentralized file systems usually split their uploaded files into small pieces and scatter them over the network. In the master agents systems, the master keeps track of which chunk and its copy is kept in which agent respectively. In peer-to-peer systems, they usually follow content-based addressing schemes in which the address of the chunk is nothing but the hash of the chunk itself. The routing layer takes care of finding the content given the hash.

Why Pinning?

When a file is uploaded in Swarm, it is split into multiple pieces, called chunks, and scattered around the network in an orderly fashion. All nodes that participate in the Swarm network pledge some storage. Since the Swarm node and the chunk addressing schemes fall into the same space, it is assumed that each of the Swarm nodes will get a chance to store an appropriate percentage of chunks based on the storage he has pledged.

If the assigned amount of storage in a Swarm node reaches the pledged threshold, Swarm tries to Garbage Collect those chunks which are least accessed. So, over time, parts of the uploaded contents may be Garbage Collected (GC’d) and will vanish from the network. If there is an attempt to retrieve a file whose chunks are not present in the network, a 404 will be presented to the Swarm client.

Probably the most important thing is that, with persistent storage, Swarm can be used in real application and thus it is MVP ready.

Long-Term Solution

This is a problem for all unpopular content. The long-term solution for this is incentivizing Swarm nodes to store content which is insured. Let’s say you want to store your birth certificate and you access that once or twice per year. In a normal scenario, this will be Garbage Collected, but if you promise to pay the storers of these chunks for their service, then they will store them and your certificate will be accessible all year throughout.

Swarm has an incentive layer which is not yet implemented. This layer is built on the basis of Swap, Swear and Swindle protocols which will make the entire procedure trustless.

Short Term Solution

Implementing the Swarm incentivization protocols will take some time and until then the problem should be addressed by Pinning. Pinning is a way of storing a copy of the uploaded content in the local Swarm node. If the content in the network becomes Garbage Collected, the local node can come to help. Any client can contact the local node (usually called a gateway node) and get the content.

When a file is uploaded with the “x-swarm-pin: true” header, the file is uploaded in Swarm and then pinned, too.

File Upload : When a file is uploaded in Swarm, its file stream goes through the chunker. The chunker splits the file into 4k pieces and sends them to be stored in the “Chunk Index” of the local data store. Think of the chunk index as a key value store where the key is the chunk’s address and the value is the concatenation of the timestamp, the kademlia bucket ID and the actual chunk data itself. This happens when a file is uploaded with or without the pinning header.

Pin Index: Pin Index is a separate index store which is created in the local data store for the purpose of pinning chunk management. So, when a file is uploaded with the pinning header as “true”, after the usual File Upload process described above, this process kicks in. In this process, every chunk of the file that was pinned will be stored in an index to maintain the number of times a chunk is pinned. Here the key is the chunk Address and the value is a counter called the “Pin Counter”.

Pinning Process: For the first pinning of a chunk, a new row is added in the pinning index and pinCounter is initialized to 1. For every subsequent pinning of this chunk, the pin counter is incremented.

Unpinning Process: When unpin is called on a file, the operation goes through all the chunks (from the root) belonging to that content and decrements the value of that chunk by 1 in the Pin Index. when the value of the pin counter becomes 0, that means no one is interested in this chunk anymore and therefore the chunk entry is removed from the Pin Index.

Garbage Collection: Garbage Collection is triggered when the storage threshold is reached on a given Swarm node. Normally, this goes through the “Chunk Index”, sorted through access time, and removes the least-used chunks from the Chunk Index, which means the chunk data is deleted from this node. And if this node is the nearest node for this chunk based on the address, then this chunk is now not available in the Swarm network itself.

But, Garbage Collection skips all the chunks that are present in the Pin Index. This is because they are pinned by at least one party. So the bottom line is that pinned chunks survive GC and they remain permanent in the node until it is unpinned.

Pinning Information: Swarm is all about chunks. Anything beyond the chunker is only addressed by chunk and not by the filename. But people talk in terms of files and directories. Our HTTP API also exposes pinning and unpinning for files and collection of files. Once the Pinning process is over, this high level information is stored in a separate DB called the “State Store”. This includes the top hash, the pin counter, content size and other vital information.

HTTP APIs

All HTTP API examples are given here assuming that the Swarm node is running locally.

1) Upload and Pin a file by adding “x-swarm-pin: true”

Create a file called files.tar by tarring a directory

curl -H “Content-Type: application/x-tar” -H “x-swarm-pin: true” — data-binary @files.tar http://localhost:8500/bzz:/

2) List all the content that is pinned and its information

curl -X GET http://localhost:8500//bzz-pin:/

[

{ “Address” : “0x94f78a45c7897957809544aa6d68aa7ad35df695713895953b885aca274bd955”,

“IsRaw” : “false”,

“FileSize” : “12046”,

“PinCounter” : “2”

},

{ “Address” : “0xccef599d1a13bed9989e424011aed2c023fce25917864cd7de38a761567410b8”,

“IsRaw” : “true”,

“FileSize” : “146”,

“PinCounter” : “5”

}

]

3) Unpin a file that has already been pinned

curl -X DELETE http://localhost:8500/bzz-pin:/<MANIFEST OR ENS NAME OR SWARM RAW FILE HASH>

The screencast below shows how these commands work in a real Swarm node.

https://drive.google.com/open?id=1EukNsOgxZY84jDgs9iyh0tz599DzsJCd

Swarm MVP

With this feature, individuals can run a single Swarm node that is exposed to outside world (Swarm Gateway) with — pinning-enabled flag and run their DApps on Swarm. This will be one step closer towards running a fully decentralised application on Swarm. Developers can use this to pin their DApps on Swarm Gateway and expose this to their users.

Global Pinning

Today, a pinned file can only be accessed through the Swarm node through which it is pinned. Other Swarm nodes can access the contents until they are GC’d. Global Pinning is something where a pinned file in one Swarm node should be accessible from another Swarm node, even if the content is GC’d in the Swarm network.

The next logical step would be for developers to write smart contracts and sell simple storage solutions as operators. Once incentivization in Swarm is done, more sophisticated cases can be handled, which will give rise to more storage based applications over Swarm.

Let’s stay in touch!

The Swarm team is reachable on Mattermost.

Discussions about Swarm on /r/ethswarm and /r/ethereum subreddits.

Please feel free to reach out via info@ethswarm.org

Swarm up your inbox with our monthly newsletter! Subscribe here.