Why is the full Merkle path needed to verify a transaction?

If I have transaction A and wish to verify that it is in the block, and I also have the hash of transaction B and the hash of the hashes of A and B, then can't I just hash A and then see if what I get by hashing A's hash and B's hash is equal to what is in the Merkle tree? Why do I have go all the way to the root?

merkle-tree

Jeff

Posted 2017-01-09T17:23:06.233

Reputation: 213

Answers

If you have A, H_B and H_AB you can obviously check whether A fits. As Jestin noted this is essentially a full Merkle tree with two leaves.

However, as a thin client, you only have readily available the Merkle root (which is in the block header) and get told about A. The intermediate levels of the Merkle tree are not provided, therefore, to calculate them, you'd need the block's complete set of transactions.

_{Image via Mastering Bitcoin}

So, for a thin client, we calculate the Merkle branch instead. For the Merkle branch, we just need the transaction's position in the block's transaction list and the hashing partners at each level, instead of the complete set of transactions. A Merkle branch is impractical to fake because it would require finding of a hash collision (which is not doable, or mining wouldn't work). So by going up the tree and combining our result with the respective hashing partner at each level, we finally get the Merkle root. Thus we can prove membership of the transaction in the block.

In the example image, you only need to provide the blue information to link H_K to the Merkle root, whereas checking the whole Merkle tree would require all transactions from A to P and the Merkle root.

Murch

Posted 2017-01-09T17:23:06.233

Reputation: 41 609

But don't you need the path which in fact consists of intermediate levels? – Jeff – 2017-01-09T20:38:49.820

@Jeff: I've edited to clarify. :) – Murch – 2017-01-09T20:40:56.173

I assume the client who wants to verify the existence of the transaction tells the full node (?) the transaction id (?) and the full node then provides the path? What makes the client believe a given transaction is in a particular block? And when would we first detect that transaction K is not in the block? When you say "blue" I think you mean solid blue? – Jeff – 2017-01-09T20:48:17.077

Thin clients only get the block headers usually. They give a bloom filter to the full nodes which resolves to transactions they are interested in. In response full nodes tell them about transactions that are found with the bloom filter. The thin client then requests the Merkle branch for any transaction they are interested in. It is hard to prove that a transaction is not in a block, but it is easy to prove that it is. If nobody can provide proof that K is in the block, it probably isn't. –– Yes, I meant solid blue. – Murch – 2017-01-09T23:51:31.387

I am still not understanding why we need the full Merkle branch: why would not a single hashing partner be sufficient? – Jeff – 2017-01-10T17:09:38.140

2@Jeff: If you were trying to verify that K is in the block, and you only get HL, you can merely calculate HKL, but you don't know whether HKL is actually in the tree. To check that, you have to hash it with HIJ, but again, you don't know if the resulting HIJKL is in the tree. So, you go up until you can calculate the Root, because if your calculation matches the information that you have from the block header, you know that the transaction fits the block header. – Murch – 2017-01-10T22:32:48.740

@Jeff: Meanwhile, you're fairly sure that the block header is correct, because it adheres to the difficulty requirement and it would be quite the investment to fake a block header that adheres the difficulty requirement. Even better when you learn about new block headers that reference the block header of your transaction's block and also adhere the difficulty requirement. – Murch – 2017-01-10T22:34:13.187

Thanks for your explanation. Is there an example with a small tree that demonstrates this? – Jeff – 2017-01-13T05:25:43.567

@Jeff: The example I gave corresponds to the image in the answer. Or what do you mean? – Murch – 2017-01-13T17:18:32.633

Well, one question is, one you say: You don't know whether HKL is actually in the tree, does this mean that a "malicious" full node might deliberately give you a false hashing partner? – Jeff – 2017-01-13T23:35:20.923

1@Jeff: If they give us a wrong hashing partner, a few levels further up the result will not match the Merkle root, unless they managed a successful preimage attack i.e. found a second input that creates the same given output. The latter is thought to be infeasible for SHA-256d. – Murch – 2017-01-14T00:42:56.613

When starting with Hk and given Hl, how does one know which of (Hk + Hl) or (Hl + Hk) to compose when checking back up to the root? Does the Merkle Path include a branch-left/right indicator? – GoZoner – 2018-05-15T17:01:17.620

@GoZoner: Good question! The full information set includes the transaction you're getting the branch for, the transaction's position in the block's transaction list and the list of its hashing partners up to the Merkle root. You can calculate from the position whether it's the left or right hashing partner for each hashing partner. — I've updated my answer. – Murch – 2018-05-15T17:25:50.980

What you are proposing is essentially a Merkle tree with 2 transactions. The hash of the hashes is the Merkle root, and the hash of B is the rest of the Merkle path. You are going all the way to the root.

When you scale this up, you can provide the next highest node in the tree...but what verifies that it belongs in the tree? You'll need to provide the hashes all the way up in order to verify, because it's the Merkle root that is hashed into the proof of work algorithm.

Jestin

Posted 2017-01-09T17:23:06.233

Reputation: 8 339

<Jestin> But why would this fail if transaction A was one of 100s of transactions? If A were not in the block, then A's hash would not be in the hash of the hashes of A and B, right? – Jeff – 2017-01-09T18:07:05.297

@Jeff, are you saying that 100 hashes would be concatenated and then hashed? If so, that's a o(n) solution, compared to the o(log n) solution that a Merkle tree provides. – Jestin – 2017-01-09T18:10:13.760

I may not be /probably am not understanding the procedure which involves a "Merkle path" but it seems to me that the absence of A in the block would become apparent without having to use all of the nodes in the path. – Jeff – 2017-01-09T18:14:54.890

I think the part that you are missing is that intermediate nodes in the tree are not always available, only the root. Therefore, the path back to the root must also be provided. Check the "transaction-data" section of the developer guide (https://bitcoin.org/en/developer-guide#transaction-data), specifically the part about Simplified Payment Verification.

– Jestin – 2017-01-09T19:19:54.353

Merkle trees are used extensively by SPV nodes. SPV nodes don’t have all transactions and do not download full blocks, just block headers. In order to verify that a transaction is included in a block, without having to download all the transactions in the block, they use an authentication path, or a merkle path. To understand why we need authentication path, you need to understand how Merkle trees work.

A merkle path is used to prove inclusion of a data element. A node can prove that a transaction K is included in the block by producing a merkle path that is only four 32-byte hashes long (128 bytes total). The path consists of the four hashes (shown with a shaded background in A merkle path used to prove inclusion of a data element) HL, HIJ, HMNOP, and HABCDEFGH. With those four hashes provided as an authentication path, any node can prove that HK (with a black background at the bottom of the diagram) is included in the merkle root by computing four additional pair-wise hashes HKL, HIJKL, HIJKLMNOP, and the merkle tree root (outlined in a dashed line in the pic below) merkle path

user2203937

Posted 2017-01-09T17:23:06.233

Reputation: 89