By Zhijie Ren and Peter Zhou

Scalability is one of the most important problems in blockchain and has been the focus of both industry practitioners and academic researchers since Bitcoin was born. This article is the start of a series of articles that focus on blockchain scalability, providing a systematic way to categorize various solutions and analyzing their pros and cons. Our goal is to allow the communities and the general public to have an in-depth view on the current development of this issue. At the end of the series, we will introduce some new ideas for scalable blockchains, resulting from our research at VeChain, and compare them with other related work.

No matter if you are a blockchain researcher working in academia, or a cryptocurrency enthusiast, you must have heard of the term “scalability” or “scalable blockchain”. It has been so much talked-about and hyped. However, a lot of the time, a “scalable” blockchain is often considered as merely another name for a blockchain that can achieve a high TPS (transaction per second). Sometimes, it is even worse that its true meaning is twisted or even abused to mislead the public and get undeserved advantages. On the other hand, we have seen numerous reports and articles written by research institutes, companies, or media, trying to investigate and objectively compare the scalability of various blockchains. However, hardly any of them could refrain from taking toxic information from those false statements.

This article is aimed to give you some ideas of what the term “scalability” actually means in the domain of blockchain. Despite the fact that the term is well defined in many scientific areas, it has quite a few meanings in blockchain as you will see later in this article. It is our desire to bring to you the latest development on blockchain scalability by both blockchain practitioners and more importantly, academic researchers. We believe that it is crucial for the public to have a better understanding of the issue so as to allow the community and industry to grow healthier and faster.

For most computer systems (e.g., a database or search engine), “scalability” refers to the system’s capability to handle a growing amount of work, or to scale [1]. A system does not scale well, or in other words, has a poor scalability, if, instead of simply being allocated more resources (e.g., computing power, servers or bandwidth), it requires additional efforts that modify the system to cope with the increased workload.

However, in the domain of blockchain, the word “scalability has a much broader range of meanings. (Interestingly, even the term “blockchain” hasn’t been well defined from an academic point of view.) For instance, in one of the most important paper on the topic of blockchain scalability [2], any improvement of Bitcoin in the perspective of throughput, latency, bootstrap time, or cost per transactions was called “scaling” and the resulting blockchain system scalable.

Nowadays, there are various blockchain systems that can be considered “scalable”, yet their throughputs differ greatly. Note that the word “scalable” is a comparative term in blockchain. When a blockchain system is called scalable, it indicates that the system achieves a higher TPS than some existing systems through modifying its consensus mechanism and/or adjusting some system parameter(s).

In fact, we can categorize scalable blockchains into the four types:

Scaling Bitcoin: solutions to improving the throughput of Bitcoin via enlarging its block size or reducing block interval without changing Bitcoin’s POW consensus algorithm Scaling POW: solutions that still work in the Nakamoto consensus framework, but achieve a higher throughput than the Bitcoin’s POW algorithm through modifying the algorithm Scaling Byzantine Fault Tolerance (BFT) Algorithms: solutions based on BFT algorithms, but having a reduced message complexity than PBFT Scale-out Blockchains: solutions that relax the requirement of having validating/mining nodes to know the whole transaction history, so that the throughput of the system could grow as the network size grows and therefore, achieve better scalability than the above three types of systems

Scaling Bitcoin

We all know that Bitcoin scales poorly. It is because the design of Bitcoin POW does not allow it. In Bitcoin, POW is used as a random method to determine the next valid block, i.e., all nodes running POW for a certain amount of time to determine the winner. Moreover, a new block needs to be synchronized to the whole network so that every node could (roughly) compete in the same race for the next block. Essentially, Bitcoin POW has a cascade structure as shown in the figure below.

Essentially, Bitcoin POW has a cascade structure that the consensus algorithm could only start after all nodes finished receiving and validating all blocks.

It is OK that the synchronization takes 1 minute when the POW duration is 10 minute (as in Bitcoin). However, Bitcoin would no longer be fair and secure if the synchronization time is comparable to each POW cycle, which would happen if the block size increases or the block interval decreases significantly, e.g., the block interval reduced to 1 min. In such cases, we would see many forks appearing in the network, eventually resulting in a very long confirmation time and a reduced security level.

In other words, an implicit constraint in Bitcoin is that, the runtime of each round of the consensus algorithm should be significantly larger than the synchronization period. How much time that the synchronization needs depends not only on the design of the consensus algorithm, but largely on the characteristics of underlying network, e.g., throughput, latency, topology, decentralization level. In the paper “On scaling decentralized blockchains” [2], it is estimated that Bitcoin could not reach more than 27 TPS based on the Bitcoin network in 2016. This bound might not be applicable to an altcoin using the same POW algorithm for consensus or even the Bitcoin today as the networks are different in size or decentralization level. However, the aforementioned constraint still holds. Hence, the “naive” approaches which enlarge the block or reduce the block interval could only “scale” Bitcoin by a small margin.

Scaling POW

To address the problem stated above, new POW schemes are proposed such that the security of the system does not hinge on the synchronization of new blocks as illustrated by the figure below. In other words, the consensus period does not need to be significantly larger than the synchronization time, but could be made comparable or even equal. For example, in Bitcoin-NG [3], consensus is used to only determine the round leader instead of the whole transactions set. Thus the synchronization of the transactions could be done in parallel and a larger block size could be used. Some other blockchains in this category are Hybrid Consensus [4], Byzcoin [5], and GHOST [6].

A scalable POW would parallelize the synchronization and validation with consensus, thus the full bandwidth could be used to transmit messages.

POS

We can also include some novel POS schemes in the scaling POW category in the perspective of scalability. It is because within these systems, the network consensus is achieved by leader selection mechanisms that are based on random number generators which do not require a long time to run to achieve fairness. Hence, they do not have the constraint that “the consensus period should be significantly larger than the synchronization time” and could straightforwardly go for a large block size, same with the scaling POW solutions. Some well-known projects are: Ouroboros [7], Snow White [8], Dfinity [9], and Algorand [10].

Scaling BFT

Byzantine Fault Tolerance (BFT) algorithms is a family of consensus algorithms that can tolerate faulty nodes behaving arbitrarily, thus could allow honest nodes to reach consensus in untrusted networks. It was originated from the Byzantine generals problem proposed by Leslie Lamport at early 80s [11]. However, due to the lack of “real” use cases, a practical version of BFT only appeared in 1995, called Practical Byzantine Fault Tolerance (PBFT) [12].

PBFT has O(N²) message complexity, as can be observed on the prepare and commit phase. [12]

PBFT is an algorithm that has O(N²) message complexity as illustrated in the following figure. Here N is the total number of validating/mining nodes in the network. The figure below illustrates the five steps in each consensus round and an arrow represents a message being sent from one node to another. It can be seen that to reach consensus on one message, the message has to be first broadcast to all the nodes in the network. and then be broadcast again by every node to every other node.

One of the major drawbacks of PBFT is that it scales poorly against the network size due to the O(N²) message complexity. It is easy to find out that the number of messages sent between nodes for each transaction would grow quadratically with respective to the increased number of network validating nodes. Then, since the bandwidth could only grow proportionally to the number of nodes, the throughput will decrease as the network grows and in principle, it could not be used in a network with more than, for instance, 50 nodes.

To tackle this problem, there have been a few ideas proposed to scale classical BFT algorithms like PBFT. The first attempt is called speculative BFT. The idea is very simple: nodes first speculatively assume that the network condition is good and the environment is trusted, and use simpler and more efficient schemes to try to reach consensus. In case the attempt fails, they switch back to the much costly PBFT. It is equivalent to trade “worst case latency” for “best case throughput”. Note that this type of BFT, for example, Zyzzyva [13], exists even before the concept of blockchain. As the scalability issue becomes more and more important, the idea of speculative BFT was revisited and used by blockchain practitioners and researchers as a building block to construct their blockchain systems such as Byzcoin, Algorand, and Thunderella [14].

Zyzzyva speculatively uses a O(N) message complexity scheme to reach consensus. [13]

The second idea is to deliberately remove the redundancy in the BFT process by using an information theoretic tool, called erasure coding, that could improve the efficiency of bandwidth usage. For instance, Honeybadger-BFT [15] falls into this category. The third idea is to introduce randomness into communications between nodes such that after receive the message, instead of hearing from all other peers’ opinions to confirm, each node simply listens to some randomly sampled nodes and makes a decision accordingly. Theoretically, with high probability (or in other words, highly likely), the node will make a correct decision if the sample size is chosen properly and the sampling process is truly random. The consensus algorithm, Avalanche [16] uses this idea to achieve a better scalability.

Scalable POW (POS) vs. Scalable BFT

Although different in both form and concept, and their use for building different types of blockchains, the scalable POW (POS) and scalable BFT schemes mentioned above could have similar performance in terms of throughput in similar network settings. Ideally, both approaches should make maximum use of the bandwidth for message transmissions and achieve an unhindered O(N) message complexity. 100–1000 TPS in a network with hundreds of nodes would be a rough approximation for the throughput for scalable POW (POS) or scalable BFT. In other words, if you see a term “scalable blockchain” nowadays, it mostly refers to these two types of “scalability”.

Directed Acyclic Graph (DAG)

It might surprise many people that DAG-based consensus algorithms also fall into this category, as many believe that they should be scale-out (I will explain it very soon) instead. However, the fact is that, most DAGs, no matter they are academic proposals like Phantom [17], Conflux [18], Avalanche, or industrial projects like IOTA [19] and Hedera Hashgraph [20], require all messages to be known by every node. Phantom, Conflux, and IOTA could be seen as advanced versions of GHOST (scalable POW) that make better parallelization of consensus and synchronization. Avalanche and Hedera Hashgraph could be seen as speculative BFT algorithms that give high throughputs with less strict BFT assumptions.

Scale-out Blockchains

This concept is more similar to the original “scalable” definition in distributed systems, in a sense that, both a scale-out blockchain or a scalable distributed system enjoy higher throughput as the network grows. The fundamental difference between them is that the scalability defined in distributed systems requires the performance of the system to grow linearly to the number of servers (nodes), which is something in general not achievable for blockchain because of decentralization.

Hence, blockchain researchers have aimed for a lower level of scalability to let the throughput of the network grow sublinearly as the network size increases. The resulting schemes are mostly referred to as “scale-out” blockchains/schemes nowaday. You may not have heard of the term scale-out, but you should definitely heard of “sharding”, “Lightning Network [21]” or “Ethereum Plasma [22]”. They can all be considered as scale-out solutions to the problem of blockchain scalability.

In a scale-out blockchain, some messages would never reach some nodes. Here by “nodes”, we means those participating both validation and consensus. In the context of Bitcoin, it would mean that miners needn’t know and validate all transactions. A serious consequence of this setting is that it increases the risk of double spending since coins spent in a transaction could be spent again at the nodes who do not know the transaction. To prevent double-spending happening and meanwhile, keep the setting, we may need some nodes in the network to validate transactions on others’ behalf, which in fact re-introduce certain level of centralization to the system. As a result, either security or decentralization is compromised. This problem is often known as the “blockchain scalability trilemma”. Because of the trilemma, there has been a debate about whether we should even pursue scale-out schemes at all.

The blockchain scalability trilemma. A more common version will have “scalability” at top. However, the “scalability” in here should actually be “scale-out”.

As we have already mentioned some of the scale-out schemes, there are two popular strategies to design and implement a scale-out blockchain: one is through sharding and the other through off-chain schemes.

Sharding is about dividing the whole network into sub-networks, namely “shards”, where nodes in each sub-network share a local ledger. Ideally, each node only needs to know, validate, and store the messages transmitted within its own shard instead of all messages. We can think of sharding as breaking the original blockchain into smaller blockchains, which are less secure since there are fewer nodes validating the transactions and participating in the consensus.

Therefore, the biggest challenges for the sharding strategy are: 1) how to secure each shard, and, 2) how shards could efficiently and securely interact to handle inter-shard transactions. For example, if some cryptocurrencies are moved from shard A to shard B, the receiver in shard B should inquiry multiple nodes from shard A about the validity of the currencies to prevent being cheated by malicious senders. There have been many solutions proposed to address these two problems. Here we just list a few of them: Omniledger [23], Chainspace [24], Rchain [25], Sharding for Ethereum [26]. Their details are beyond the scope of this article.

The off-chain schemes are largely inspired by Lightning Network, which uses some clever tricks to enable an off-chain channel between two nodes for fast payments without needing to register every transaction between them on Bitcoin. However, such convenience comes with a cost, that is, both party must put deposits on chain in advance to open the off-chain channel between them. Since then many off-chain schemes have been proposed to go beyond Bitcoin and the application of fast payment. In particular, parties are allowed to interact through other types of messages, e.g., the multi-party transactions, conditional payment transactions, and smart contract transactions. Then, the challenge is how to design and efficiently deploy such off-chain mechanisms with on-chain enforcements for different types of messages in different blockchains. Some widely discussed off-chain projects includes: Plasma, Polkadot [27], and Liquidity [28].

Sharding vs. Off-chain

Straightforward as it might seem, it is actually quite hard to tell the difference between sharding and off-chain schemes. Some sharding schemes might also have a main chain or shared consensus among all shards, and some off-chain schemes might also divide nodes into groups. Here, we distinguish them in a more theoretical way.

In fact, the term “consensus” consists of two properties, consistency (agreement) and liveness (availability). The former means two honest nodes should not have disagreement about the content of a message. The latter means that if an honest node knows a message, all other honest nodes will eventually know it as well. For both sharding and off-chain schemes, liveness is compromised since some messages will not be known by all honest nodes. The difference between them is the way they handle consistency. In particular, sharding guarantees consistency in a shard with certain security degradation. On the other hand, off-chain approaches make no hard guarantee on consistency. Instead, consistency depends on some economical enforcement, like deposit on the main chain and a penalty mechanism when someone misbehaves off-chain.

VAPOR

Besides sharding and off-chain approaches, we have recently proposed another scale-out blockchain solution, called VAPOR [29]. The system is based on an important assumption called “rationality” that we observed in existing blockchain systems. In particular, we find that most blockchain systems consider a special type of messages, i.e., transactions, and most systems implicitly assume that blockchain participants are rational toward transactions. For example, given Alice is rational, if Alice wants to buy something from Bob, after she makes a payment transaction to Bob, she will be responsible for proving the authenticity of this transaction to the Bob. And Bob, if he is rational, will only sell her goods after he verifies that the transaction is indeed confirmed and authentic. We call these “rationality in value transfer”. VAPOR utilizes the “rationality” in the value transfer system to scale out without compromising security and decentralization. In other words, VAPOR could be used as a fully secure and decentralized value transfer system, e.g., cryptocurrency, without requiring each node to know, validate, and store all transactions. However, it has a limitation in functionality, that is, it can only be used for value transferring, so that the “rationality” assumption could hold.

Discussion

We hope the concept of blockchain scalability is somehow clearer to you now. The most important take-home message is that a so-called“scalable blockchain” doesn’t say anything about its scalability unless when it is compared to Bitcoin, Bitcoin POW, classical BFT, or non-scale-out blockchain.

Criteria to determining scalability

It is very hard to judge the “scalability” of a blockchain system without some theoretical background and experience in this field. However, I think the following three criteria can be used to judge whether a particular blockchain system enjoys the three types of scalability that have been discussed so far:

Is the blockchain using Bitcoin POW type of consensus? If yes, is there a constraint that nodes should always synchronize with the newest blocks or otherwise their mining power would be wasted? If yes, it is not a scalable POW. Is the blockchain using BFT type of consensus? If yes, is there any smart trick that is used to reduce the message complexity? If no, then it is not a scalable BFT. Does every piece of message need to be known by every validating/mining node? The node here means the nodes that are involved in consensus, i.e., the nodes who can generate blocks (a.k.a. Miners in the context of Cryptocurrencies). If yes, then it is not a scale-out blockchain.

Quantify the scalability

Let me give a somewhat more concrete idea about scalability in term of TPS. As we all known, if a blockchain does not scale out, every node participating its consensus should acquire all message. Then, the throughput of the system would be bound by the least capable node in the network. Hence, a throughput of a home PC, e.g., 100–1000 TPS, would be a reasonable expectation of the maximum TPS that a fully decentralized blockchain can achieve. In other words, if a non-scale-out blockchain claims a throughput of 10,000 TPS, it suggests that the system be quite centralized as nodes with lower capacity couldn’t join it. On the other hand, if a blockchain scales out, it could in theory achieve unbounded throughput. However, beware of the compromises in security, decentralization, or functionality, as it is impossible to simultaneously achieve them all.

Layer 1 vs. layer 2

“Whether layer 1 or layer 2 is the best solution to scale blockchain?” is such a hot discussion that we couldn’t get around when talking about “scalability”. However, although they are seemingly very related to this article, we avoid directly addressing this topic or using both terms. The reason is that these terms are also not very well defined and that since we are aiming at “clear out the confusion of scalability”, we try not to involve even more confusions to this problem. Nevertheless, we give a brief description here.

In particular, “layer 1” is used to represent all efforts on scaling blockchains by modifying the current consensus algorithms or proposing new consensus algorithms, which includes all algorithms we described in this article, except for off-chain schemes. However, as we already explained, their achieved “scalability” are very different. On the other hand, “layer 2” approaches are basically off-chain schemes. Hence, it is not appropriate to compare “layer 1” and “layer 2” in the perspective of scalability as only one category of “layer 1” approaches, i.e., sharding, achieves the same “scalability” as “layer 2”.