by Andre Cronje

First, lets identify each area of storage

Smart Contract State DLT State Immutable Data Storage Mutable Data Storage

Universal State

State is stored on an account level. An account has ownership of its storage. This storage is captured in a trie.

Understanding Smart Contract State

State is a mapping between addresses (160-bit identifiers) and account states. Account states are Recursive Length Prefix (RLP) encoded data structures. This mapping is maintained in a modified Merkle Patricia tree. The root node of this structure is cryptographically dependent on all internal data.

The account state consists of the following fields

Nonce Balance StorageRoot CodeHash

StorageRoot is a 256-bit hash of the root node of a Merkle Patricia tree which is an encoded mapping between 256-bit integer values.

Understanding DLT State

The current design of blockchains, requires replaying all data from transaction index 0. This requires full chain data to be stored. This is how to achieve the current accurate UTXOs or State. Every transaction created needs to be stored, shared, and computed. The current distribution mechanism for this is blocks.

This design has no built in archiving strategy, the ledger will grow infinitely and more storage must be consequently added.

This further increases the barrier to entry leading to less decentralization. A mobile device would currently not be able to participate in the Ethereum network unless it can store 1TB worth of data.

Append only ledgers grow infinitely. We propose to investigate multiple archiving strategies. The

following strategies have been identified

Signature snapshots

Mimblewimble (UTXO only)

State transition proofs

Signature snapshots (HashGraph & Algorand)

There is a shared state that is maintained by every node in the ledger (or in the shard, when the ledger is sharded). At the end of each round, each node calculates the shared state after processing all transactions that were received in that round and before. It then digitally signs a hash of that shared state, puts it in a transaction, and gossips it out to the community. Then it collects those signatures from all other nodes.

In this way, a node can have a copy of the state with a set of signatures that proves to a third party that this is the true, consensus state. This allows the node to construct a small file which is a verifiable proof that the state was truly the consensus.

The state is organized as a Merkle tree, so a third party can be given a proof that consists of a small part of the state, plus the path from there to the root of the Merkle tree (including siblings of those vertices in the tree), plus the signatures, and an address book history for the public keys.

The proof must also include an “address book”, which is list of the public keys of all the members, along with each member’s stake (owned directly or by proxy). A third party will need this address book in order to check the signatures on the state (or portion of state).

The proof must also include an “address book history”. This is a sequence of address books, where each address book is signed by members from the previous address book. Any given address book must be signed by a set of members that own more than 2/3 of the stake, according to the membership and stake from the previous address book. This chain of address books extends back to the genesis address book, which is the initial members who created the ledger at the beginning.

The hash of the genesis address book is important. It serves as a unique identifier of the ledger. It is the “name” of the ledger.

If a small number of members want to split off from the group and create a new ledger that is a fork of the current one, they have the technical ability to do so, and can even create the initial state of their new ledger to be identical to the old ledger. So it is a fork. However, they will not be able to create an address book history reaching back to the genesis address book, with the members of each address book signing the next one, because the majority of members (who are not forking) will not sign the address book for the minority of members who are forking. This forces the new fork to have a new genesis address book, and therefore a new unique identifier, and therefore a new name. Consequently, those creating the fork will be unable to fool anybody into thinking the fork is the legitimate ledger.

When a client submits a transaction to a node to send to the ledger, the client receives in response from the node the cryptographic proof that their transaction has affected the shared state correctly.

When Alice transfers cryptocurrency to Bob, both of them can receive a cryptographic proof that the transaction succeeded. This proof includes the signatures reaching back to the genesis address book.

So they not only verify that the transfer occurred, they verify that it occurred on the correct ledger. If a ledger forks, no client will ever be confused about which ledger they are dealing with because only one ledger at a time can have that name.

Furthermore, if a 50/50 split were to happen, then neither side would be able to prove a connection to the genesis address book. It wouldn’t be a fork; it would be the complete destruction of one ledger, and the creation of two unrelated ledgers. This would greatly reduce the value to the nodes, because they would no longer be able to earn fees from the clients who want to access the original ledger. And all of the original cryptocurrency would, in a very real sense, cease to exist. This creates an enormous disincentive to forking.

In this way, confusing forks simply become impossible. Non-confusing forks become unappealing for the nodes. So there are strong incentives to avoid forks, even aside from any legal incentives. The cryptographic proofs and unique identifiers are also critically important for secure sharding. They allow shards to send each other messages, with assurance that the message from a given shard was truly the consensus of that shard.

Mimblewimble

Bitcoin is categorized as a UTXO based system. UTXO define Unspent Transaction Outputs. An account has Inputs described as;

type Transaction struct {

ID []byte

Vin []TXInput

Vout []TXOutput

}

type TXInput struct {

Txid []byte

Vout int

Signature []byte

PubKey []byte

}

type TXOutput struct {

Value int

PubKeyHash []byte

}

A transaction consists of Inputs (value received by the account), and outputs (values sent from the account).

So if Alice has input 1 BTC from Bob, then Bob has output 1 BTC to Alice. If Alice then wishes to send 0.5 BTC back to Bob, and 0.5 BTC to Charlie, Alice would create a transaction with 1 BTC input from Bob, 0.5 output to Bob, and 0.5 output to Charlie.

The Mimblewimble specification has the concept of UTXO compacting. Called Cut-through, it is explained as follows

Blocks let miners assemble multiple transactions into a single set that’s added to the chain. In the following block representations, containing 3 transactions, we only show inputs and outputs of transactions. Inputs reference outputs they spend. An output included in a previous block is marked with a lower-case x.

I1(x1) — — O1

|- O2

I2(x2) — — O3

I3(O2) -|

I4(O3) — — O4

|- O5

We notice the two following properties:

Within this block, some outputs are directly spent by included inputs (I3 spends O2 and I4 spends O3).

The structure of each transaction does not actually matter. As all transactions individually sum to zero, the sum of all transaction inputs and outputs must be zero.

Similarly to a transaction, all that needs to be checked in a block is that ownership has been proven (which comes from transaction kernels) and that the whole block did not add any money supply (other than what’s allowed by the coinbase). Therefore, matching inputs and outputs can be eliminated, as their contribution to the overall sum cancels out. Which leads to the following, much more compact block:

I1(x1) | O1

I2(x2) | O4

| O5

Through compacting, Mimblewimble can reduce all UTXO to single pairs. Balance outputs. This allows the chain to be drastically compacted.

The same can be achieved with finalized transactions and state output.

We redefine a transaction as the sum of its inputs and outputs and derive a current state. We can group all similar transactional outputs into single inputs for state transitions.

If Alice sent Bob 0.5 BTC, and Charlie 0.5 BTC, followed by Bob sending 0.25BTC to Charlie, this can be represented as Alice sends 0.25 BTC to Bob and 0.75 BTC to Charlie. This is the concept behind compacting.

Applied over the entire blockchain, we can considerably reduce the amount of transactions that need to be replayed to arrive at the current state, reducing size and sync speed.

State Proofs

We have computation c(a,b) to execute. We give c(a,b) to untrusted parties. We assume standard 2n/3 fault tolerance, so we send c(a,b) to 3 parties. 2 parties return with the same results for c(a,b). We assume the result for c(a,b) is correct. This is verifiable computing.

There are a few forms of verifiable computing, our two focus areas are;

Intel SGX (Also known as trusted hardware)

zk-SNARKs (zero-knowledge succinct non-interactive argument of knowledge)

zk-SNARKs prove with zero knowledge that something is true, and it is provable by providing zero knowledge.

EVM execution occurs on chain because we use multi party consensus to verify computing. We provide proof with zero knowledge that execution occurred in a trusted manner. Execution no longer needs to occur on-chain.

A zero knowledge proof EVM, can ensure verified computing.

We have secured execution, but we still have outputs, for example ERC20 addresses and balances. We have a zk-proof, that proves that a state transition occurred.

We knew state s1, and we can prove that transition occurred, we can prove s2. We need s1 and transactions tn to prove s2. Given Merkle m and transition proof , we can prove that participant p has balance b.

To have verified data, all the chain needs to save is merkle m and proof .

t = (s1,tn)=s2 m

setup(t)

t with witness w

prove(,tn,w)

verify(,tn,)

Consider a block, a block is our proof of transactions in it. A block is proof of a state transition. State s1 when applied with block bn gives us state s2. So bn is a state transition. We showed how a state transition can be represented as a zero knowledge proof . So instead of a block, we could represent this state transition with

Ethereum block size is currently 25k bytes, , 288 bytes. At time of writing Ethereum is 1 099 511 627 776 bytes. By simply replacing the blocks with zero knowledge blocks, you would decrease the system to 12 195 622 907 bytes, 1% of the current size.

The above also considers you want to keep each state transition and merkle from the beginning of time, which isn’t required. This is a different form of verified computing, a full deterministic transaction history isn’t required. All that is required is the most recent merkle and most recent .

We now define SNARKs for state transition systems. At a high level, we would like poly(λ)- size proofs (which are verifiable in poly(λ) time) which prove statements of the form “there exist a sequence of transitions t1, . . . , tk : T such that update(tk, update(tk−1, . . . , update(t1, σ1), . . .)) =

σ2”.

In other words, we would like succinct certificates of the existence of state-transition sequences joining two states. The application to blockchains is the following: we will take our state to be the database of accounts (along with some metadata needed for correctly validating new blocks) and transitions to be blocks containing transactions.

Defining Immutable Data Storage

Storing data in a smart contract requires 32,000 ETH per 1GB of data. This is a fixed ratio and ignores the price of the token. We will show that the value of a token is correlated to the production value of the ecosystem. You can therefore not directly correlate production value as a fixed value towards storage.

Instead, a secondary storage marketplace is designed that allows for variable storage pricing secured by on-chain proofs.

Distributed token marketplace model, providers and consumers.

Off-chain storage, on-chain proofs.