Merkle Trees

Cryptographic hash functions are often used to verify the integrity of a list of items, such as the broken-up chunks of a large download. In such cases, one option is to merge all the chunks and take the hash of the complete download. The problem with this is that if one chunk is corrupted, the user won’t find out until the entire download is complete, and even then they won’t know which chunk is corrupt. A better solution is to take the hash of each chunk individually so that each chunk can be verified as it comes in. However, if there are a large number of chunks, then there is a greater chance that some of the hash values will become corrupted. Furthermore, this is a lot of data for the trusted source to store. Ideally, a trusted source would only have to provide one hash, and the rest of the hashes could be downloaded from untrusted sources, such as peers in a peer-to-peer network. This can be accomplished using a top hash generated by hashing all of the hashes of the chunks. The resulting structure is called a hash list.

If the number of chunks is very large, the list of hashes of all the chunks might also be quite large. In order to verify just one chunk against the trusted top hash, one would need to obtain all of the hashes in the hash list. Ralph Merkle proposed the idea of a hash tree in 1979, which allows a chunk to be verified with only a logarithmic number of hashes.[6] In a hash tree, or Merkle tree, hashes are taken of all chunks as in a hash list, but then these hashes are paired and the hash of each pair is taken, and these hashes are then paired again, and so on until there is only “one hash at the top of this tree of hashes.

To verify the integrity of just one chunk, it is only necessary to obtain a small subset of the hashes in the hash tree. For any hash in the tree, if the desired chunk is not in the branch below it, then that branch can be stubbed out by dropping it and keeping only the hash at the top of that branch. For example, if you wanted to verify data block 1 in the diagram, you need Hash 0-0, Hash 0-1, Hash 0, Hash 1 and the Top hash. The branch rooted by Hash 1 can be stubbed out, removing Hash 1-0 and Hash 1-1, keeping just Hash 1 to represent the whole branch. For large trees, the number of hashes needed to verify one chunk can be much smaller than the number of chunks.

You'll only receive email when Emmanuel Amberber publishes a new post

More from Emmanuel Amberber