Our first attempt was to take a hex trie after a block is executed and convert it to a binary one just before extracting the witness from it.

This approach has some benefits: it is easier to implement, also it is trivial to validate that hex-to-bin conversion.

Unfortunately, we ran into two problems, one of them was critical.

First of all, as it turned out, hex tries contain more account nodes than similar binary tries. So, if we are converting them later, we don’t get the exact witnesses as it would be if we just had binary tries from the start.

Why is that?

That is due to a fact that hex tries always grow their height in 1/2 byte intervals. Bin tries grow in just 1 bit interval, that makes it possible to have tries with key length with odd number of bits.

In practice, there are some additional EXTENSION nodes in the witnesses and they are ever so slightly bigger. But even for big blocks (~5000 transactions) this difference was quite small comparing to the witness size (< 5%).

What was critical is the performance. With the growth of the trie, the conversion was slower and slower.

To put it in numbers, on our Google Compute Engine VM, the processing speed was around 0.16 block/second. That is less than 10 blocks per minute. To process 1.000.000 blocks we need more than 3 months. Oops.

So, we decided to go with a more complicated approach and built an experimental branch that uses binary tries natively. That means, that we replaced all hex tries in turbo-geth codebase with bin tries, so the blocks always run on binary tries.

On the downside, a few hashes checks had to be ignored (block root hash, and sometimes account storage hashes due to a limitation of our blockchain snapshots mechanism).

But the main verification mechanism stayed the same: we should be able to execute blocks using bin tries and subtries generated from witnesses.

Let’s talk about keys.

For simplicity, we encode keys very inefficiently: 1 byte per nibble; 1 byte per bit of a key. That simplified the code changes a lot, but the "keys” component of the block witness (see this article to learn what a witness consists of) is 8 times bigger than it should be, if we used bitsets (as we should).

So, for the further analysis I will assume that all keys are optimal (they encode 1 bit of information using 1 bit of memory).

Hex vs. Bin: Results

I ran the analysis on 2m blocks from the Ethereum mainnet in 2 intervals.

blocks 5.000.000–6.500.000

I will also provide commands to repeat the experiment using python scripts in the github repo: https://github.com/mandrigin/ethereum-mainnet-bin-tries-data

First, let’s analyze or dataset a bit.