As we have demonstrated, Ethereum’s history and transaction structure are very different from those of Bitcoin. Ethereum’s state is constructed from accounts, and a transaction is made up of information that triggers modification. The state and transaction each record completely different kinds of data, so there is no superset or subset relationship between them. This means history and state refer to data from two different dimensions, and transaction history size and state size have no causal relationship.

When a transaction modifies the state, a new state is created (the leaf nodes with solid line) and the old state is stored (the leaf nodes with dashed line) as a historical state. Thus, Ethereum’s history consists of both transactions and historical states. Because history and state belong to different dimensions, Ethereum’s block headers include two different Merkle roots, one containing transactions and one containing state. (Bonus points: EOS uses a model similar to Ethereum’s account model, but the block header doesn’t include a Merkle Tree Root containing state. Is this a good thing or not?)

In Ethereum, for this reason, every block and every account will always occupy a node’s disk space. Ethereum nodes have many different modes of syncing, and all history and state are saved under Archive mode. Because “history” includes both transaction history and state history, the total amount of data stored has now surpassed 2 TB. Under default mode, historical state is pruned, with only transaction history and current state stored locally. This results in a total data size of about 170 GB, where transaction history represents about 160 GB and current state about 10 GB.

Ethereum manages costs using the gas fee model, with different transaction sizes consuming corresponding amounts of gas. Each EVM instruction only takes into account the computational overhead and the storage overhead when determining gas fees. Therefore, the gas limit of each block indirectly limits the growth rate of history and state.

Side note: A common misunderstanding is that Ethereum’s “blockchain size” already exceeds 1 TB. But the analysis above shows that “blockchain size” is a very vague term. Only if we include historical state in the calculation do we arrive at such a large figure.

Discarding the historical states does not cause any problem for full nodes. This is because every historical state can be recomputed (without consideration of the computation time) as long as the Genesis and transaction history exist. The meaningful figure is the amount of data that must be stored on a full node. For Bitcoin, this is 200 GB; for Ethereum, 170 GB. So both require a similar amount of data, which is small enough to store easily on a regular cloud host. Contrary to popular belief, the fact that the number of Ethereum full nodes is decreasing is not because of increasing storage need (the root cause is actually the cost of computation during syncing, but that is a subject for another time). Considering that the length of Ethereum’s history (the gap between the current block timestamp and the Genesis block timestamp) is less than half that of Bitcoin, we can conclude that the rate of growth of Ethereum’s history and state is much higher.

The Tragedy of the (Storage) Commons

“The tragedy of the commons” refers to a situation where unrestricted individuals deplete a system’s common resource to the detriment of everyone, including themselves. Blockchain nodes, which share their disk space (a kind of common resource), are not immune to this phenomenon.

Blockchain nodes use three types of resources to process transactions: CPU, disk space and network bandwidth. CPU and bandwidth are resources that refresh with each block. For this kind of replenishable resource, we can compensate the node with a one-time transaction fee. (For the correlation between handling fee and calculation complexity and transaction size, please refer to RFC0015 Appendix 1.)

By contrast disk space, the third resource used by blockchain nodes, is a long-term occupied resource. The disk space occupied by one block cannot be used by another user at a later time unless it is released by the previous owner. A node will keep its occupied disk space live indefinitely, but the user of the occupied space does not have to pay for the use of the space. Transaction fees, which are a one-time event, do not compensate for the ongoing use of the resource. Users essentially gain permanent rights to a storage system with more availability than Amazon S3. This kind of infinite and permanent storage cost is collectively borne by all full nodes in the blockchain network.

Because all kinds of dApps operate on the Ethereum blockchain, the phenomenon of the Tragedy of the Commons is more widespread here than on Bitcoin. For example, at block 5700001 (May 30, 2018), the top 5 smart contracts ranked by storage usage are:

EtherDelta, 5.09% IDEX, 4.17% CryptoKitties, 3.05% ENS, 1.92% EOS Sale, 1.73%

The EOS sale offers a good example of this problem. Even though the sale has concluded and EOS tokens are now transacting on the EOS chain, the disk space used by the sale has become a permanent part of an Ethereum full node. As a result, part of a full node’s disk resource is effectively rendered unusable.

The examples above show that a lack of proper management, intentional or not, leads to blockchain’s disk space being abused. In a well-designed economic model, the user should and must bear the cost of storage. This cost is not only proportional to the size of the occupied disk space, but also to the length of time that the owner occupies it.

State Explosion

Both historical state or current state consume storage resources. This analysis has demonstrated that although Bitcoin and Ethereum have managed the growth of history and state, they have not put in place a means of controlling the total size of historical state and current state.

As a result, data will continuously accumulate. Over time it will require more and more disk space to run a full node in the blockchain system, and more disk space means much more cost. The most likely outcome of this is fewer miners running full nodes, which will have a negative impact on the decentralization of the blockchain. This is of course the last thing we want to see.

You might ask, is there any possibility that improvements in hardware will exceed the growth of historical state and current state? It is extremely unlikely.

The chart above shows the growth of the Ethereum network. The amount of state accumulation has grown exponentially. Bitcoin has grown from 0 to 3GB in 10 years; but Ethereum has gone from 0 to 10GB in 4 years. The problem of state accumulation is masked by the fact that we have not yet solved the blockchain’s scalability problem. Because it remains a niche technology, State Explosion has not bubbled to the surface.

But what happens when we solve the scalability problem, and blockchain technology really gets mass adoption? When there are millions of DApps and billions of users, how fast will blockchains’ historical state and current state accumulate?

This is the crux of the State Explosion problem, and it is rightly classified as a post-scalability issue. It will only become obvious after the scalability problem is solved. We became aware of it while working on the permission chain project, because the performance of the permission chain is much higher than the public chain and it is just in the post-scalability phase.

So how do we solve State Explosion? Relatively speaking, handling the accumulation of historical data is easy. We can compress these states by decentralized checkpoint or zero-knowledge proof in the future. We can even drop historical state altogether and still keep the blockchain running normally. The much tougher problem is how to handle the accumulation of the current state, because we need this data to run the full node.

Some Blockchain projects have offered solutions that go part way toward addressing the problem. EOS RAM is a useful attempt: RAM represents the available memory resources of the supernode server, and account, contract status and code all require a certain amount of EOS RAM to run.

But the design of RAM also has problems. It must be purchased through a built-in trading market, it is not transferable, it cannot be rented, and the short-term memory demand in the contract execution process and the long-term storage requirement of the contract state are mixed together.

There are also no rules around the total amount of RAM. Everything depends on what hardware configurations that the supernode can withstand, rather than the cost of the consensus space.

The Ethereum community has noticed the State Explosion problem as well, and proposed a Storage Rent solution: users must prepay a rent to use storage, which will continue to consume the rent while the resource is being used. The longer it takes, the more rent the user needs to pay.

But there are two problems with this Storage Rent solution:

The prepaid rent will someday run out. What happens then? To solve this problem, Storage Rent must be supplemented by mechanisms such as resurrection, which can increase the complexity of the design, sharply reduce the immutability of the smart contract, and negatively impact user experience.

Ethereum’s state model is a shared model but not First-Class State. Let’s take the ERC20 token as an example: the assets of all users are recorded in the storage of a single ERC20 contract. In this case, who pays the state rent?

Nervos is Built with State Explosion in Mind

Solving the State Explosion problem is one of the goals of Nervos CKB. For this reason, CKB is taking a completely different approach to the design of a blockchain network, with many innovations. We are using a multi-layer architecture designed to store only high-value Assets on Layer 1, with transactions taking place on Layer 2.

Part of the rationale is that this reduces the amount of data that must be stored locally. Although blockchain has not yet solved its scalability problem, we are building CKB for the long-term and therefore we must address State Explosion now.

Please find more details on Nervos, its architecture, and its vision on our Github.