Blockstack :A decentralized naming and Storage system using blockchain

Introduction

In this post, i intend to cover Blockstack, a public benefit corporation and one of their foundational papers. Blockstack has been pioneering in the DApp space for quite some time and have been using blockchains to build a DNS equivalent and a PKI/CA equivalent, without relying on root servers or central authorities. I ended up discovering their work because of a post on fat protocols and thin clients by union square ventures — the idea is the web in its current form consists of thin protocols like TCP/IP and fat applications like facebook, google. Most of the value is captured by these applications running on the top.

Blockchain and token mechanisms can incentivize creation of fat protocols in a decentralized network — where the data is captured in the fat protocol layer itself. The applications on top can be thinner and no application ends up owning the big chunk of the value e.g. value in the social network could be user data. In the DApp world, important user data may remain more mobile across different Apps.

Before going any further, this post assumes that reader has some technical familiarity with bitcoin. You can also read more on it here if you want.

Applications using blockchain

The two natural applications that can benefit from decentralization are DNS and Identity/PKI infrastructure. DNS maps domain names to IP address records. The identity service issues a global identifier to users and then also associates cryptographic keys with that identity. Another one is associating storage with these identities.

Using blockchain for establishing Name-Value ownership

What bitcoin like blockchain gives is a way to establish global consensus on an append-only log in a network of trustless nodes. In bitcoin network, the network of nodes are incentivized to establish a global view of the append only log. This append-only-log can model a system’s state transitions. Each transition is recorded in the log and then accepted by the network via consensus establishment.

Both DNS and Identity services map a key to a value. The ownership assignments and transfers of such keys-value pairs can be modeled in an append only log. The log basically records all such transactions with some operations. A system can then read the log and reconstruct the most recent state of this key-value map.

With append-only-log(blockchain) as the foundation, it is possible to have a DNS like naming service in which domain names are mapped to IP/DNS records. Assignment or transfer of ownership of a domain-to-IP can be agreed upon by the network’s blockchain and recorded in the blockchain i.e. in the append-only log. Obvious advantage of this system being that there is no centralized authority that is issuing and maintaining this mapping — because of that one cannot easily seize the domain ownership without getting access to the privates keys of the domain owner. Also we don’t have to worry about attack on root servers as the central points of failure. Altering any name in the system also requires proof-of-work, thus making it costlier.

Before this paper, the common method to use the blockchain was to fork it and then use it to build your own blockchain network. While this approach was used by the authors in running Namecoin- the DNS equivalent, they ran into number of challenges running their own network.

Challenges in running your own blockchain network

The paper does a great job of highlighting challenges in bootstrapping and maintaining your own network. Here is a brief summary of challenges from their operational experiences with Namecoin (This also ties in nicely with the design choices that BlockStack opted for).

Security: They often found what miners pooled their resources together and often launched 51% attacks on the network and thus effectively controlled what the network did. While it was attractive to fork the blockchain and introduce your own features, it is often hard to do it while ensuring security properties of the network. Network reliability and throughput: Software bugs caused issues because of which registrations of new names came to a halt. Someone sent transactions with too many data fields which caused miners to crash. This effectively stopped any appends to the blockchain. Another issue that they ran into was the reduction in the throughput of the network. A large pool, intentionally or otherwise could effectively black hole a lot of transactions and a transaction could only get accepted when it landed on a node not in that pool. Bitcoin network’s security and stability properties are/were far superior to other networks at the time of the writing. Selfish mining: Authors noticed a pool of more than 1/3rd compute power, mining to their advantage in a way that a lot blocks were rejected and some blocks were added in a rapid succession. Consensus breaking changes: When major software upgrade needs a hard fork, all miners need to upgrade their software otherwise they cannot participate in the network. But incentives may not be aligned for miners to upgrade unless it is a big network like a bitcoin. The authors saw that a lot of miners never came back up online after such changes were introduced to their smaller network. This means that it is better to separate consensus breaking upgrades from other software releases and to incentivize miners for upgrades. Issues with Merged Mining: Bootstrapping a new blockchain is hard since it is easy to control it in the beginning when enough nodes aren’t there. Bitcoin incentivizes the bitcoin miners to help and bootstrap other blockchains using merged mining. But authors saw this failing where bitcoin miners could controls disproportionate resources on this alternate chain using this auxiliary PoW mechanism.

Architecture of BlockStack

All these challenged mentioned above lead to the most fundamental decision of using a mature blockchain like bitcoin for the control plane. But mature blockchains can be very slow and costly. You cannot really use it to store any considerable chunks of data.

So this leads to a further separation of control and data planes. Blockchain can be used for control information like name registration or transfers while the data plane can be used for storing data like DNS records or identity information. Even bigger volume data can be stored (encrypted) in third party storage systems like dropbox/AWS. Users can trust the control planes. But they need not trust the the data planes in any way.

This leads to the 4 layer architecture of Blockstack:

Layer 1: Bitcoin like blockchain that stores the authoritative global consensus on the system state.

Layer 2 — VirtualChain: A blockchain agnostic layer that takes input from the blockchain and can create an arbitrary type of state machine. For example, DNS state machine can be different than Identity state machine. Also this layer can deal with any blockchain you want, but the reliability and security properties will be derivatives of the underlying blockchain. Virtuachain also binds the names to their values. Hash of the zonefiles is stored in this layer.

Layer 3- Routing: This layer implements a DHT that stores the routing information for the values. Blockstack uses DNS like zonefiles for indicating the final storage location for data. Basically layer 3 has the job of discovering the final data associated with the given name. Any user can verify the integrity of the zonefile by verifying the hash stored in the layer 2.

Layer 4-Storage: This is where all values are stored. This can be in AWS or dropbox or any third party vendor. There are two types here.

Fast mutable storage:

This data is signed by private key of the owners of the name. So writing involves only overhead of encryption. While verification involves verifying integrity of the zonefile in layer 2 and then using the public key for verification of data. As you can see writing doesn’t involve any changes to the zonefile.

Slow immutable storage:

In this case, in addition to writing data to storage, the zonefile is modified to add a TXT record to it. This TXT record contains the hash of the data. Since we modified the zonefile it triggers subsequent changes in virtualchain which stores the hash of the zonefile. This also means changes to the underlying blockchain.

Here is a diagram of the blockstack architecture:

4-Layer architecture that separates control plane and data plane. Layers 1 and 2 are the control plane. 3 and 4 are the data plane.

Control plane of Blockstack Naming System

The layer 1 and 2 of the blockstack naming system is described below. Layer 3 and 4 less interesting for the purposes of this discussion.

Layer 1 — Bitcoin blockchain

Blockstack uses namespaces in their naming system, similar to DNS. The names are owned by the private keys on the bitcoin blockchain. Someone interested in owning a name goes through a two step process of preorder and then register. The first entity to complete these two steps on the blockchain gets to own the name. In the first step of preorder, one needn’t reveal the name — which automatically puts the attacker racing to grab the same name at a disadvantage.

Layer 2 — VirtualChain

Virtualchain maintains the state transitions for the naming system. For example, simple registration of a new domain does through following states on the virtual chain.

Absent -> Preorderd -> Registered -> Revoked

Once the name is registered one can update it, transfer it etc.

The landgrab that happens for popular domain names on DNS, can be prevented in Blockstack naming system by using smart pricing functions. e.g. Smaller names and namespaces are more expensive as they can be more popular. Names without numbers are more popular than one with numbers.

john.id > johnsmith.id > johns007.id

Simple Name verification

How does one verify a name? Verification is dependent on blockchain and traversing the whole blockchain is prohibitively expensive. When someone wants to verify a name on a mobile device, it can be prohibitively expensive to read the entire blockchain, since bootstrapping of a new blockchain takes upto 1–2 days.

Blockstack achieves this essentially by doing a backward jumps in the orders of 2^i and verifying hashes corresponding to those blocks in the blockchain and the state operations of that block in the virtual chain.

Consider that you want to verify a name. So first thing to do is to locate is where are the authoritative transactions for this name. Instead of doing a linear search backwards from the most recent trusted block, SNV will take steps of 1,2,4, 8, .. until it knows that the interested transaction is somewhere in between two blocks. Then it can repeat the same process between those two blocks again. Here is the diagram of that:

Verifying a name that is present in transaction T. The first block us a trusted block and then SNV takes backward steps to ensure integrity of previous blocks

Performance characteristics

There is some performance overhead associated with both reads and writes. There is also additional cost of storage, but that doesn't seem prohibitively high. This design ensures that one can leverage the performance characteristics of third party cloud providers like AWS as much as possible. As per the authors, there are many low hanging performance optimizations possible that can further improve read/write performance.

Conclusion

I found this paper to be a very interesting read. It was great to learn the more about the production issues observed in running a Namecoin like network. The subsequent design decisions of control and data separation that can leverage bitcoin for it’s security properties and third parties for high performance bulk storage also sounds like a great approach .