Image Courtesy of CC Search. By Stan Levandovsky

Intro

As discussed in my previous article, Ethereum 2.0: A Complete Guide, one of the largest issues facing the Ethereum network today is scalability. The Ethereum network supports hundreds of decentralized applications and processes hundreds of thousands of transactions each day. As use of the Ethereum network has increased, with more Dapps being created and more transactions being carried out, the time and cost of transactions has continued to rise. This has made the Ethereum network slower and less convenient to use. In a recent speed test, the network managed just 20 transactions per second (various sources have stated anywhere between 12 and 45 transactions per second over the last year). In comparison, centralized networks such as paypal and Visa regularly perform 193 and 1667 transactions respectively, and Visa’s maximum capacity is close to 24,000.¹ As use of the Ethereum network continues to expand, a robust and efficient mechanism to enable long term scaling and mass adoption has become more and more important. The Ethereum community has been researching how to scale the Ethereum network since 2014, and both Ethereum and other blockchains have investigated several options to achieve scalability. More recently, the community chose sharding as the most promising method for scaling the network. Part One of this article will explore various methods that have been investigated to scale Ethereum and other blockchains. Some pose unsatisfactory alternatives to sharding and some can be used in conjunction with it. Part Two will explore the details and road-map of sharding itself.

The Blockchain Trilemma

In order to contextualize scaling Ethereum, one should be familiar with the “blockchain trilemma.” First proposed by Ethereum co-founder Vitalik Buterin, the trilemma states that a blockchain network can fully satisfy the parameters of only two of the following three properties: decentralization, security and scalability. An ideal solution to the trilemma would be a compromise between these three properties that does not detract from any of them to the extent that the blockchain is compromised or ceases to function. Currently, the Ethereum network is decentralized and secure, but lacks scalability. Attempts to scale the network must ensure that decentralization and security are not compromised if they are to offer a comprehensive, long-term solution.

Unsatisfactory Ideas for Scaling

One of the earliest ideas considered for scaling blockchains was to simply increase the size of each block. The size of a block on the Ethereum blockchain is determined by its gas limit, 8 million gas per block. The thinking goes that because the rate at which blocks are mined will remain the same, increasing the gas limit will allow more transactions to take place in the same period, increasing the network’s speed and throughput. Although this is technically true, increasing block size past a certain optimal point tends to create more problems than it solves, and does not actually increase throughput by a factor that would justify the associated negative impact. To begin with, larger blocks are harder to process and often lead to higher latency which is bad for the network. This means only people with very advanced and expensive hardware will be able to successfully mine after a certain block size is reached.² Larger block sizes also mean more storage is needed, which increases the required capacity of each node. The current Ethereum blockchain is about 227GB, so doubling the blocksize would double this number without bestowing a “significant increase” in transaction speed or throughput.³ This would again favor wealthier individuals who can afford better hardware with more storage. Finally, larger blocks make it harder for nodes to sync with the network. This provides still another advantage to individuals with more resources and makes network reconfiguration slower and more difficult in the event of an attack. All these factors mean that block size increase beyond an optimal point leads to a greater risk of centralization, as blocks that are big enough to bestow a significant increase in throughput will concentrate power in the hands of the network’s wealthiest members and prevent average users with consumer level hardware from running a node or mining Ether.⁴

Another incomplete method examined to increase scalability involves the use of many different altcoins to take load off the main chain. Instead of sending all transactions to a single main chain, transactions could be offloaded onto smaller chains, each with its own separate capacity. This has the potential to increase throughput by a steady factor, but at the cost of security. This is because altcoins with smaller market caps and lower values require less resources to attack or take over. As Vitalik points out “…an N-factor increase in throughput using this method necessarily comes with an N-factor decrease in security.” Therefore, this method is only safe for small values of N, which means it cannot significantly increase scalability and remain secure.⁴

A similar tactic can be found in a process called “merge mining,” which is employed by projects such as Namecoin. Just like using many different alt coins, merge mining seeks to remove load from the mainchain and increase throughput by directing traffic to several different side chains. However, unlike altcoins, all the merge mining chains share the same miners and mining protocol, or in a Proof of Stake system, validators. This increases the cost of a potential attack which makes merge mining more secure than using many altcoins. The method also allows for an increase in throughput by a constant factor. However, an N increase in throughput necessitates an N increase in mining power which leads to the same centralization risks encountered with increased block sizes. All of the above mentioned ideas would help Ethereum to scale but at too severe a cost to either decentralization or security. Therefore, they do not present satisfactory solutions to the blockchain trilemma and have been passed over by the Ethereum 2.0 team.

Promising but Limited Ideas for Scaling

Recently, several more promising solutions that would increase scalability without seriously compromising security or decentralization have been investigated and are now being implemented by various projects. One such solution involves the use of advanced cryptography like ZK-SNARKs or Mimblewimble to speed up node verification. This system creates cryptographic proofs of blocks on the chain that can be verified more quickly and easily by each node. This solves the issue of “initial full node synchronization” by allowing nodes to verify only the proofs, rather than the entire chain from genesis. ZK-rollups and ZK-ZK-rollups are also being considered to help scale Ethereum transactions by a factor of up to 30.⁸Although these devices can lead to an increase in transaction speed, they are far from a complete scaling solution, and Vitalik believes that the same result could be achieved more simply using crypto economics instead of pure cryptography.

Plasma Chains

Another promising, if still only partial solution, has been presented in the form of “plasma chains.” Plasma chains are semi-independent blockchains that can be registered to, and commit transactions on, the Ethereum mainchain. These transactions can be committed to the mainchain one at a time or in batches.⁵ A helpful analogy for plasma chains is the purchase of chips at a casino. Customers can enter the casino with currency, buy however many chips they like, and use those chips as they wish, provided they follow the casino’s rules and guidelines. Once they have conducted whatever business they wish in the casino, they can cash out and convert the chips back into currency. A plasma chain on the Ethereum blockchain works the same way, except the “money” here is Ether and the chips are a specific altcoin that is native to the plasma chain.⁵ In this way, transactional demand is delegated off the mainnet so that the main nodes can distribute their computing power in a more cost-effective and logical way.⁵

Plasma chains are relatively secure because they are connected directly to the Ethereum mainchain and use Ethereum as a source of universal truth. Users can also rely on the security and rules of the Ethereum main net in order to redeem their Ether or plasma tokens. Plasma chains use a tool called MapReduce in conjunction with a merkle trie construction to facilitate quick and easy fraud verification in the event of a Byzantine component (bad actor).⁶ Collin Cusce explains: “Those with stake in the Plasma Chain will self-monitor and file disputes in the form of proofs to the root chain when they can prove malfeasance.”⁶ Plasma chains even include a “roll-back” feature that will be activated if a “bonded truth of spend” request fails due to a fraud attempt by a maleficent actor. If this happens, all funds will be returned to the state of the network before the attempted fraud. This roll-back is very cheap computationally, only consuming 2-bits of space on the parent blockchain.⁶ In this way, plasma chains allow the parent network to scale and perform better. At the same time, they maintain a reasonable degree of security by using the parent network and its associated tools and protocols as a single source of truth.⁶

There is a lot more potential here for dynamic and secure scaling than in many of the other ideas discussed above. However, like any solution, plasma is not without its issues and compromises. To begin with, using plasma chains can only help Ethereum scale by a constant factor, rather than an exponential one.⁴ Although they are connected to and use the same source of truth as the Ethereum mainchain, plasma chains are not as secure as the Ethereum blockchain itself because they are susceptible to “denial of access” attacks. The issue is that smaller blockchains with less nodes and users are more susceptible to attack than huge ones with more nodes and users like the Ethereum mainnet. If there were a large attack on a plasma chain, all the chain’s users would need to withdraw the funds held there, back to the mainchain. If the number of users trying to withdraw their assets at the same time exceeds the short-term computational resources of the mainnet, any user who has been unable to withdraw at this point will experience a denial of access, a serious security issue. Ethereum’s Sharding FAQ explains that in such an event, “…there will not be enough space in the blockchain to process all withdrawals in time, and so the system will be insecure.”⁴ Luckily, there are ways to mitigate this insecurity. Withdrawal delays could be made flexible so that they are extended automatically if many withdrawals are simultaneously requested. This would help to secure user funds but would still allow a powerful attacker to lock up everyone’s funds for a long period of time. This “extended denial of service” is a security failure, but a much milder one than a “total loss of access”. It would also be less impactful if the plasma chain in question was used specifically for micro-transactions rather than transactions with high values.⁴ In fact, this is the sort of plasma chain that makes the most sense, since high transaction fees and times severely disincentivizes micro-transactions on the mainchain. It is also important to note that none of the insecurities associated with plasma chains would affect the security of the Ethereum mainchain.

Although still a relatively new and untested concept, the potential pros of plasma chains seem to outweigh the cons. When the “Cryptokitties” Dapp went live on the Ethereum mainchain, the resultant spike in transaction volume had a tangible, negative impact on the network’s usability, including increased transaction prices and wait times. If the Dapp had been built on a plasma chain, the Ethereum mainchain would have been free to carry on with more important and valuable transactions and its usability would have remained largely unaffected. Because of the relatively low value of assets and transactions on this hypothetical “kitty chain”, the incentive for an attack on it would be low. Even if an extended denial of service attack were carried out, the real-word impact on users would be annoying, but far from catastrophic. Vitalik agrees that the trilemma trade-off for plasma chains is a “different direction of trade-off” from the previously mentioned ideas, and is “arguably a much milder trade-off,” making plasma chains “a large improvement on the status quo.”⁴ Perhaps the best feature of this innovation is that plasma and sharding are not mutually exclusive. Eth 2.0 will feature both solutions so users will be able to choose whether to use the sharded mainnet, a plasma chain, or both, depending on their unique individual needs and preferences.

State Channels

Another partial solution to scalability can be provided by channel-based strategies like the one used on Bitcoin’s Lightning network or Ethereum’s Raiden network. State channels feature many of the same advantages and trade-offs as plasma chains but operate quite differently on a technical level. Rather than setting up a separate blockchain that is intermittently connected to the Ethereum mainnet, Raiden is simply a network of different users who connect to the mainnet using Ethereum smart contracts. All users on the network who wish to transfer assets must deposit a certain amount of Ether into a special smart contract connected to the mainchain. The smart contract opens a channel on the Raiden network that is directly connected to another user, and the deposit is used as collateral for transactions made by the depositor. This user can transfer as much Ether as they would like on the network, provided the total transfer amount is equal to or less than the deposit they made in the smart contract.⁷ No one except for the two users who opened the channel have access to the associated smart contracts, and their deposits of Ether ensure that neither party engages in double spending or other forms of fraud. Both parties in a transaction are required to provide a digital signature which is tied to their Eth deposit and holds them accountable so that they cannot back out of a transaction once it has been confirmed. This ensures that the system is fair and consistent as well as secure.⁷

If a user wishes to transact with another individual using state channels, they do not necessarily need to open a channel with that individual. Instead, they can use intermediary channels between other users, as long as there is “at least one possible route that connects the parties through a network of possible channels…”⁷ In this way, transactions can be conducted with a high number of users, even if an individual has only opened one or two channels on the network.⁷ Transactions via direct payment channels do not require any sort of fee, just the security deposit. Intermediaries in transfers that require the use of more than one channel will be allowed to charge a small fee to others for the use of their channel. These fees are subject to a competitive free market which is intended to help to keep prices reasonably low and proportional to the value being sent. One of the strongest features of state channel solutions is that because transactions do not require global consensus like they do on the mainchain, transfers can be completed nearly instantaneously. Furthermore, the mainchain is not encumbered by state channel transactions in any way, except for the one-time creation, and eventual closing, of each channel.⁷ Like plasma chains, state channels are complementary to sharding, and both are being developed simultaneously. Unfortunately, also like plasma chains, the channel based solutions are susceptible to denial of service attacks. However, as I have already noted, these attacks can be mitigated so that they are not a detrimental insecurity. In short, the Raiden network presents a usable channel-based approach that will enable significant, if not comprehensive scalability, without compromising security or decentralization to an unacceptable degree. The Sharding FAQ explains that “On-chain scaling via sharding (plus other techniques) and off-chain scaling via channels are arguably both necessary and complementary.”⁴

To conclude, many layer 1 options have been explored to try and scale Ethereum and other blockchains without compromising their security or decentralization. Some have failed, and others have met with limited success. After much research, the Ethereum community has chosen sharding as the most promising means of achieving massive scalability and solving the blockchain trilemma. Plasma and state channel solutions are layer 2 options that will be used in conjunction with sharding.

Special thanks to Aidan Hyman, Greg Markou and Cayman Nava for reviewing this article and making many valuable suggestions. Furthermore, this article would not have been possible without the work done by many other great writers and researchers in the space. Thanks to all of those involved in the creation and publication of all the sources cited below!