While having studied blockchain technology, definitely we can see the term ‘Merkle Tree’ in so many places. Merkle Tree is the backbone of the blockchain. To understand the basics of the blockchain, one should be aware of Merkle Tree and the related terminologies. In this attempt, we are trying to give you the basics of Merkle Tree and a simple implementation of the same using Python script.





Merkle Tree is a special type of data structure which is completely built using Cryptographic Hash function. Before going deep into Merkle Tree, let’s have a glance on a hash function. The hash function is a function which converts the input data into a fixed length data regardless of the length of input data. The output of the hash function is called ‘hash value’, ‘hashcode’ or ‘hash’ in short. A hash function generates a completely unique hash with a fixed length for each input data. It is guaranteed that two hash functions will never collide for two or more different input data. Let’s see an example of the hash function.





Input Data Hash Code Hash Me 644D45DD9DF9D3B92E5D773E13DF5215 HashMe 54FF7B00CC8493E7948F5DDF08396CBC

When hashing the data ‘Hash Me’, you can see a hexadecimal code is generated which is 16 bytes long(32 characters).

In the second example, the data to be hashed is ‘HashMe’. The only change we made is the removal of space between the words. But in the result, you can see the hash code is entirely different. But with the same length of 16 bytes. Unlike normal encryption algorithms, no hash can be decoded into original data(plain text). Which means hashing is a one-way cryptographic method or we can say hashing is irreversible.





Different popular hash functions are, MD5, SHA256, SHA1, and SHA512. Each of them follows different cryptographic algorithms. SHA256 algorithm is the popular one which followed by most of the blockchains including Bitcoin.





As already mentioned, Merkle tree is totally build using a hashing function. In short, a Merkle tree formation is the process of making a single hash from a group of hashes. Let’s see how it works with an illustration.

The bottom blocks L1, L2, L3, and L4 are the data blocks. The first step is to hash each data block using a specific hashing algorithm say SHA256. So that the blocks Hash 0-0, Hash 0-1, Hash 1-0, Hash 1-1 are formed. This is the initial step for building a Merkle Tree. These blocks can be called as Leaves of the Merkle Tree. The minimum number of Leaves should be two and there is no upper limit.

Next step is, hashing two adjacent hashes into a single hash. That is, Hash 0-0 is concatenated with Hash 0-1 and again hashed it by SHA256 to get Hash 0. The same process is done for Hash 1-0 and Hash 1-1 blocks to produce Hash 1. The process will continue until a single hash is formed. The final hash is called Root of the Merkle Tree. It is called Merkle Root.

The validation of the existence of a data in a Merkle Tree is an important process. The data to be validated is called Target Hash. As already mentioned, hashes are irreversible. That is, it is impossible to derive the target hashes from the Merkle Root. So that, no data can be validated by decoding the Merkle root. The only way to validate a data in a Merkle tree is to rebuild the tree. Target Hash alone is not enough to rebuild the tree. One method to rebuild the Merkle Tree is by collecting all the leaves and arrange them in the same order and build the tree again. But this a constraint for many applications and also time and storage consuming. If we study a Merkle Tree formation in little more depth, we can see that all of the leaves are not required for building the tree. Instead, a proof to reach at Merkle Root from Target Hash can be formed. This proof is called Merkle Proof. Merkle Proof is nothing but a collection(array) of hashes which is explained below.

