Merkle trees are used in the world of cryptocurrencies as an efficient way to prove the existence of a particular value within a large set of values and with minimal storage. This article introduces Merkle trees and shows how repeated proofs against the same tree can be reduced in size significantly by storing multiple levels of branches rather than just the root (known as “Merkle pollards”).

Hashes

A hash function turns a variable-size piece of data (in this case the name of a fruit) in to a fixed-size value (known as a “hash”). An example showing hashes of “Apple” and “Orange” are shown below:

Hashing values

Hash functions have various features, but the most important ones are that even slightly different values result in very different hashes, and that it is mathematically very hard to go from the hash back to the value (usually meaning there is no faster method than guessing a value, hashing it and seeing if it matches).

Merkle trees

A Merkle tree is a way of combining multiple values and their hashes to reduce them to a single fixed-size value.

At the top of the tree are the values, known as “leaves”. Each leaf is hashed to create a top-level branch, and adjacent branches are hashed together to create the intermediate branches. Eventually this will result in a single hash, known as the Merkle root. A graphical example of a Merkle tree is shown below:

A Merkle tree

The above example shows a Merkle tree with 8 values and 4 levels; the root is at the bottom labelled 0xd576...ffd9 .

As mentioned previously, even slightly different values will generate very different hashes. This change has a subsequent impact on all levels of the tree and will ultimately give a different root. For example, changing the single value “Peach” to “Pear” results in an altered Merkle tree as seen below:

The impact of a single change to the tree (shown in grey)

Merkle trees are reproducible: given the same values in the same order a Merkle tree will always have the same hashes for the branches and root.

Merkle paths

The Merkle path is simply the set of hashes from the value to the Merkle root. The example below shows the Merkle path for the value “Peach”:

The Merkle path for “Peach”

Merkle proofs

A Merkle proof is a way of proving that a certain value is part of a set of data without requiring any of the other values to be exposed.

A Merkle proof

A Merkle proof requires three things: a value (shown in red), the intermediate hashes (shown in green) and the Merkle root (shown in blue). For each value there will be a different set of intermediate hashes.

Merkle proofs are commonly used in blockchains to show that a value is in a set without requiring the entire set to be stored on the blockchain. For example, an Ethereum token contract might have a whitelist feature to allow select accounts to purchase tokens. Rather than store every whitelisted account on the blockchain, which would be prohibitively expensive if there are many thousands of accounts on the whitelist, a Merkle tree of the accounts can be created and just the root stored on the blockchain.

For example, if the root is stored in a smart contract it becomes easy for the contract to prove an account is on the whitelist: an account supplies the intermediate hashes (provided in some off-chain manner by the contract owner to the account holder) and the smart contract hashes the account with the intermediate hashes in order. If the result matches the Merkle root it knows the account is on the whitelist.

Note the relationship between the hashes in the Merkle path and the hashes in the proof shown in the last two diagrams: each hash in the proof is the sibling of the hash in the path at the same level in the tree. This shows graphically that the proof gives the ability to recreate the path for the value, which is why the end result will be the Merkle root.

As can be seen above some benefits of using Merkle proofs are:

the on-chain storage required is far smaller than if storing values

the full set of values are not exposed by storing them publicly on-chain

the cost of confirming the presence of a specific value in the set of values by confirming a proof can be lower (faster and cheaper) than checking against the set

Repeated proofs

In the above example each account only sends in the single proof required to verify they are on the whitelist. An alternative use of Merkle trees is as part of probabilistic proofs of knowledge (commonly known as STARKs), where each proof increases the likelihood that the creator of the Merkle tree (known as the “prover”) knows all of the values that make up the tree. In this situation the prover often generates hundreds of proofs against a single Merkle tree containing tens or even hundreds of thousands of values. The Merkle root and proofs are sent together to a verifier to confirm their validity.

Examining repeated proofs in the context of our original example, below are three proofs against the same tree: